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Apache configuration. 


Installing and Administering Drupal 


Now that we have Apache, PHP, and MySQL running, let’s install a package that 
uses them. Sadly, we don’t get paid for product placement here, so we’ll choose 
something that’s open source, big enough to represent typical real-world software, 
and useful in its own right. According to its web site (http://www.drupal.org): 
Drupal is software that allows an individual or a community of users to easily publish, 
manage and organize a great variety of content on a website. 
This includes weblogs, forums, document management, galleries, newsletters, and 
other forms of web-based collaboration. 


The following two sections describe two installation methods for Drupal: 


apt-get 
Easier, so try this first. However, we’ve had some problems with Debian Drupal 
packages. 
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Source 
More work, but you can see what’s happening; try this if the apt-get method 
fails. 


Installing Drupal with apt-get 


The easiest way to install Drupal is with apt-get. You can go to the Drupal web site 
and look for a package to download, or you can ask apt-cache whether it’s in a 
Debian repository: 

# apt-cache search drupal 

drupal - fully-featured content management/discussion engine 


drupal-theme-marvinclassic - "Marvin Classic" theme for Drupal 
drupal-theme-unconed - "UnConeD" theme for Drupal 


The first one is what we want, so let’s install it: 
# apt-get install drupal 


The installation process tells you that it needs some packages you don’t have, gets 
them, and chatters some more as it installs them. Then it asks you to configure 
Drupal through a sequence of text menus. Use the Tab key to move between choices, 
the Space bar to toggle a choice, and Enter to go to the next page. We’ll include only 
the last line or two of each screen here, and the recommended responses: 


Automatically create Drupal database? 
Yes 


Run database update script? 
Yes 


Database engine to be used with Drupal 
MySOL 


Database server for Drupal's database 
localhost 


Database server administrator user name on host localhost 
root 


Password for database server administrator root on localhost 
newmysqlpassword 


Drupal database name 
drupal 


Remove Drupal database when the package is renoved? 
No 


Remove former database backups when the package is removed? 
Yes 
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eb server(s) that should be configured automatically 
apache 

apache-ssl 

apache-perl 

apache2 


aaa Ss 


] 
] 
] 
z 


The installation will copy the program files, create a MySQL database, and create an 
Apache configuration file (/etc/apache2/conf.d/drupal.conf): 


Alias /drupal /usr/share/drupal 
<Directory /usr/share/drupal/> 
Options +FollowSymLinks 

AllowOverride All 

order allow,deny 

allow from all 
</Directory> 


If you run into an odd complaint like this one: 


An override for "/var/lib/drupal/files" already exists, but -force 
specified so lets ignore it. 


you can smack your head repeatedly as we have, or install from source. If everything 
looks good, skip the next section. 


Installing Drupal from Source 


Download the latest source distribution and move its directory to your web docu- 
ment root directory: 

# wget http://ftp.osuosl.org/pub/drupal/files/projects/drupal-4.7.3.tar.gz 

# tar xvzf drupal-4.7.3.tar.gz 


# mv drupal-4.7.3 /var/www/drupal 
# cd /var/www/drupal 


We'll excerpt the installation steps from INSTALL.txt and INSTALL. mysql.txt. Cre- 
ate the Drupal database (we’ll call it drupal), administrative user (also drupal, since 
we have no imagination), and administrative password (please use something other 
than drupalpw): 

# mysql -u root -p 

Enter password: 


Welcome to the MySQL monitor. Commands end with ; or \g. 
Your MySQL connection id is 37 to server version: 4.0.24 Debian-10sarge2-log 


Type 'help;' or '\h' for help. Type '\c' to clear the buffer. 


mysql> create database drupal; 
Query OK, 1 row affected (0.00 sec) 


mysql> GRANT SELECT, INSERT, UPDATE, DELETE, CREATE, DROP, 
-> INDEX, ALTER, CREATE TEMPORARY TABLES, LOCK TABLES 
-> on drupal.* to 
-> "drupal"@"localhost" identified by "drupalpw"; 
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Query OK, 0 rows affected (0.01 sec) 


mysql> FLUSH PRIVILEGES; 
Query OK, 0 rows affected (0.00 sec) 


mysql> quit; 
Bye 
Next, load the Drupal database definitions into MySQL: 


# mysql -u root -p drupal < database/database.4.0.mysql 
Enter password: 
# 


Then edit the file sites/default/config.php and change the line: 
$db_url = 'mysql://username:password@localhost/databasename' ; 
to: 


$db_url = 'mysql://drupal:drupalpw@localhost/drupal' ; 


Configuring Drupal 


In your web browser, go to http://server1.centralsoft.org/drupal. The first page (in the 
version we tested) says: 

Welcome to your new Drupal website! 

Please follow these steps to set up and start using your website: 

Create your administrator account 

To begin, create the first account. This account will have full administration rights 

and will allow you to configure your website. 
Click on the “create the first account” link. On this second page, type your desired 
account name (or your full name) in the “Username” text field and your email 
address in the “E-mail address” field. Then press the “Create new account” button. 
You'll be sent back to the first page, which now says at the top: 


Your password and further instructions have been sent to your e-mail address. 


Check your email for the generated one-time password, and log into Drupal in the 
“User login” area. You'll be sent to a page to specify a permanent password. After 
setting this, you can go to your home page, where you'll see these choices: 

1. Create your administrator account 


To begin, create the first account. This account will have full administration 
rights and will allow you to configure your website. 


2. Configure your website 


Once logged in, visit the administration section, where you can customize and 
configure all aspects of your website. 
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3. Enable additional functionality 
Next, visit the module list and enable features that suit your specific needs. You 
can find additional modules in the Drupal modules download section. 

4. Customize your website design 
To change the “look and feel” of your website, visit the themes section. You may 


choose from one of the included themes or download additional themes from 
the Drupal themes download section. 


5. Start posting content 


Finally, you can create content for your website. This message will disappear 
once you have published your first post. 


For more information, please refer to the Help section, or the online Drupal 
handbooks. You may also post at the Drupal forum or view the wide range of 
other support options available. 


Since you’ve already created the first (administrator) account, you’re now on your 
own to try all the other functions. Drupal on. 


Troubleshooting 


If you like diagnosing problems, you'll love the Web. There are so many things to 
break, in so many places and in so many ways, that you'll be kept busy for ages. 


Let’s look at some classic web problems. (The browser error messages are those used 
by the Firefox browser, but Internet Explorer’s messages are similar.) 


Web Page Doesn't Appear in Browser 


Let’s assume your document root is /var/www, your file is test.html, and your server 
is serverl.centralsoft.org. When you use an external web browser to access http:// 
server1.centralsoft.org/test.html, you get an error page in your browser window. 


A browser error message like “Server Not Found” implies a DNS problem. First, 
ensure that server1.centralsoft.org has DNS entries in a public nameserver: 


# dig server1.centralsoft.org 


3; ANSWER SECTION: 
server1.centralsoft.org. 106489 IN A 192.0.34.166 


Then see whether the server can be reached from the Internet. If your firewall allows 
pings, poke the server from the outside to see if it’s alive: 


# ping server1.centralsoft.org 

PING server1.centralsoft.org (192.0.34.166) 56(84) bytes of data. 

64 bytes from server1.centralsoft.org (192.0.34.166): icmp seq=1 ttl=49 
time=81.6 ms 
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Check that port 80 is open and not blocked. From an external machine, try nmap: 


# nmap -PO -p 80 server1.centralsoft.org 


Starting nmap 3.81 ( http://www.insecure.org/nmap/ ) at 2006-07-25 23:50 CDT 
Interesting ports on server1.centralsoft.org (192.0.34.166): 

PORT STATE SERVICE 

80/tcp open http 


Nmap finished: 1 IP address (1 host up) scanned in 0.186 seconds 


If you don’t have nmap, pretend to be a web browser. Use telnet to connect to the 
standard web port (80) and make the simplest HTTP request possible: 


# telnet server1.centralsoft.org 80 
Trying 192.0.34.166... 
Connected to server1.centralsoft.org. 


Escape character is '*]'. 
HEAD / HTTP/1.0 


HTTP/1.1 200 OK 

Date: Wed, 26 Jul 2006 04:52:13 GMT 

Server: Apache/2.0.54 (Fedora) 
Last-Modified: Tue, 15 Nov 2005 13:24:10 GMT 
ETag: "63ffd-1b6-80bfd280" 

Accept-Ranges: bytes 

Content-Length: 438 

Connection: close 

Content-Type: text/html; charset=UTF-8 


Connection closed by foreign host. 
If that doesn’t work, make sure this line is in /etc/apache2/ports.conf: 
Listen 80 


and see whether anything else is hogging port 80: 














# lsof -i :80 

COMMAND PID USER FD TYPE DEVICE SIZE NODE NAME 

apache2 10678 www-data 3u IPv6 30079 TCP *:www (LISTEN) 
apache2 10679 www-data 3u IPv6 30079 TCP *:www (LISTEN) 
apache2 10680 www-data 3u IPv6 30079 TCP *:www (LISTEN) 
apache2 20188 root 3u IPv6 30079 TCP *:www (LISTEN) 
apache2 20190 www-data 3u IPv6 30079 TCP *:www (LISTEN) 
apache2 20191 www-data 3u IPv6 30079 TCP *:www (LISTEN) 
apache2 20192 www-data 3u IPv6 30079 TCP *:www (LISTEN) 
apache2 20194 www-data 3u IPv6 30079 TCP *:www (LISTEN) 
apache2 20197 www-data 3u IPv6 30079 TCP *:www (LISTEN) 
apache2 20198 www-data 3u IPv6 30079 TCP *:www (LISTEN) 
apache2 20199 www-data 3u IPv6 30079 TCP *:www (LISTEN) 


If you don’t see apache2 in this output, find out whether Apache is running: 


# ps -efl | grep apache2 
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If the output contains lines like this: 


5 S root 7692 1 0 76 O- 2991 415244 Jul16 ? 00:00:00 
/usr/sbin/apache2 -k start -DSSL 


Apache is running. If it isn’t, kick it in the pants: 
# /etc/init.d/apache2 start 

Then run the ps command again. If Apache still does not appear, look at the error log: 
# tail -f /var/log/apache2/error.1log 


If you don’t have permission to view this file, you’re definitely having a hard day. If 
the error log is empty, it may also have the wrong permissions. Confirm that the /var/ 
log/apache2 directory and the /var/log/apache2/error. logfile exist: 


# ls -l /var/log/apache2 


total 84 

-Yw-Y----- 1 root adm 31923 Jul 25 23:09 access.log 
-Yw-Y----- 1 root adm 32974 Jul 22 20:50 access.log.1 
-IW-Y----- 1 root adm 379 Jul 23 06:25 access.log.2.gz 
-IW-Y----- 1 root adm 1969 Jul 25 23:09 error.log 
-IW-Y----- 1 root adm 1492 Jul 23 06:25 error.log.1 
-IW-Y----- 1 root adm 306 Jul 23 06:25 error.log.2.gz 


If the tail of the error log showed old information, you may be out of disk space. It’s 
surprising how often we forget to check this before investigating more esoteric sus- 
pects, such as firewalls. Type: 


# df 

Filesystem 1K-blocks Used Available Use% Mounted on 
/dev/hdat 193406200 455292 183126360 1% / 

tmpfs 453368 0 453368 0% /dev/shm 


If you used a different User or Group directive in your Apache configuration, check 
that the user and group exist: 


# id www-data 
uid=33 (www-data) gid=33 (www-data) groups=33 (www-data) 


If the browser returned an Apache error message, you have some more digging to do. 
If the display says: 


Not Found 
The requested URL /wrong.html was not found on this server. 


the URL was probably mistyped. If you see: 


Forbidden 
You don't have permission to access /permissions.html on this server. 


the file is there, but the Apache user can’t read it: 


# cd /var/www 
# ls -l permissions.html 
-IW------- 1 root root O Jul 26 00:01 permissions.html 
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Permissions problems can be fixed by changing the owner of the file to the process 
running Apache. 


Virtual Hosts Don’t Work 
Use 
# apache2ctl -S 


for a quick check of your virtual host directives. 


SSI Doesn’t Work 


If you see lines like this in your error log (/var/log/apache2/error.log): 
[error] an unknown filter was not added: INCLUDES 
you didn’t enable mod_include. Run the command: 


# a2enmod include 


CGI Program Doesn't Run 


If you can’t get a CGI program to run, work through the following checklist: 
e Has CGI been enabled, by one of the methods discussed earlier? 
e Is the CGI program in a CGI directory like /var/cgi-bin, or does it have a suffix 
like .php? 
e Is the file readable? If not, use chmod. 
e What does the Apache error log say? 


e How about the system error log, /var/log/messages? 


SSL Doesn't Work 


Check that you enabled the Apache SSL module (a2enmod ssl) and told Apache to 
listen to port 443 in /etc/apache2/ports.conf: 


Listen 443 


If the directive wasn’t there, add it and restart Apache. Then try to access this URL in 
your browser: https://server1.centralsoft.org. If it still doesn’t work, port 443 may be 
blocked by a firewall. You can check this with nmap: 


# nmap -PO -p 443 server1.centralsoft.org 
Starting nmap 3.70 ( http://www.insecure.org/nmap/ ) at 2006-08-01 22:38 CDT 
Interesting ports on ... (...): 


PORT STATE SERVICE 
443/tcp open https 


Nmap run completed -- 1 IP address (1 host up) scanned in 0.254 seconds 





152 | Chapter 6: Administering Apache 


Further Reading 


You can explore the shadowed recesses of the Web in books like Apache Cookbook 
by Ken Coar and Rich Bowen (O’Reilly), Pro Apache by Peter Wainwright (Apress), 
and Run Your Own Web Server Using Linux & Apache by Stuart Langridge and Tony 
Steidler-Dennison (SitePoint). 
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CHAPTER 7 
Load-Balanced Clusters 














More than 10 years ago, people discovered they could connect multiple cheap 
machines to perform computing tasks that would normally require a mainframe or 
supercomputer. NASA’s Beowulf cluster was an early example that is still in use 
today (http:/www.beowulf.org). A Wikipedia entry (http://en.wikipedia.org/wiki/ 
Computer_cluster) lays out the chief characteristics of a cluster succinctly: 


A computer cluster is a group of loosely coupled computers that work together closely 

so that in many respects they can be viewed as though they are a single computer. 

Clusters are commonly, but not always, connected through fast local area networks. 

Clusters are usually deployed to improve speed and/or reliability over that provided by 

a single computer, while typically being much more cost-effective than single comput- 

ers of comparable speed or reliability. 
Clusters are a good solution when you’re looking to improve speed, reliability, and 
scalability for a reasonable price. Amazon, Yahoo!, and Google have built their busi- 
nesses on thousands of commodity servers in redundant cluster configurations. It’s 
cheaper and easier to scale out (horizontally, by just adding more servers) than it is to 
scale up (vertically, to more expensive machines). There are many Linux cluster solu- 
tions, both open source and commercial. In this chapter we’ll discuss clusters based 
on the free Linux Virtual Server (http://www.linuxvirtualserver.org). We’ll show how 
to combine cereal boxes, rubber bands, and three computers into a load-balanced 
Apache web server cluster. We’ll also discuss high availability and, finally, alterna- 
tives to clusters. We won’t cover high-performance computing clusters, grid comput- 
ing, parallelization, or distributed computing; in these areas, hardware and software 
are often specialized for the subject (say, weather modeling or graphics rendering). 


Load Balancing and High Availability 


Load balancing (LB) provides scalability: the distribution of requests across multiple 
servers. LB consists of packet forwarding plus some knowledge of the service being 
balanced (in this chapter, HTTP). It relies on an external monitor to report the loads 
on the physical servers so it can decide where to send packets. 
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High availability (HA) provides reliability: keeping services running. It relies on 
redundant servers, a heartbeat exchange to say “I’m still alive,” and a failover proce- 
dure to quickly substitute a healthy server for an ailing one. 


In this chapter, we’re mainly concerned with LB, which administrators will generally 
encounter first and need more often. As sites become more critical to an organiza- 
tion, HA may also become necessary. Toward the end of this chapter, we’ll provide 
some useful links for information on setting up combined LB/HA systems. 


The example LB configuration we’ll use in this chapter is a simple one consisting of 
three public addresses and one virtual address, all listed in Table 7-1. 


Table 7-1. Addresses and roles for servers in our cluster 


Name IP address Description 

lb 70.253.158.44 Load balancer—public web service address 

web] 70.253.158.41 First web server—one of the real IPs (RIPs) 

web2 70.253.158.45 Second web server—another RIP 

(VIP) 70.253.158.42 Virtual IP (VIP) shared by /b, web1, and web2, in addition to their 


real IP addresses 





The VIP is the address exposed to external clients by the load balancer, which will 
relay requests to the web servers. 


Load-Balancing Software 


The simplest form of load balancing is round-robin DNS, where multiple A records 
are defined for the same name; this results in the servers taking turns responding to 
any incoming requests. This doesn’t work well if a server fails, though, and it doesn’t 
take into account any special needs the service may have. With HTTP, for example, 
we might need to maintain session data such as authentication or cookies and ensure 
that the same client always connects to the same server. To meet these needs, we’ll 
get a little more sophisticated and use two tools: 


e IP Virtual Server (IPVS), a transport-level (TCP) load-balancer module that is 
now a standard Linux component 


e Idirectord, a utility that monitors the health of the load-balanced physical servers 


The installation instructions are based on the Debian 3.1 (Sarge) Linux distribution. 


IPVS on the Load Balancer 


Since IPVS is already in the Linux kernel, we don’t need to install any software, but 
we do need to configure it. 
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On lb, add these lines to /etc/modules. 


ip_vs_dh 
ip vs ftp 
ip_vs 
ip_vs_lblc 
ip_vs_lblcr 
ip_vs_lc 
ip_vs_nq 
ip_vs_rr 
ip_vs_sed 
ip_vs_sh 
ip_vs_wlc 
ip_vs_wrr 


Then load the modules into the kernel: 


modprobe ip_vs_dh 
modprobe ip vs ftp 
modprobe ip_vs 
modprobe ip_vs_lblc 
modprobe ip_vs_lblcr 
modprobe ip_vs_lc 
modprobe ip_vs_nq 
modprobe ip_vs_rr 
modprobe ip_vs_sed 
modprobe ip_vs_sh 
modprobe ip_vs_wlc 
modprobe ip_vs_wrr 








To enable packet forwarding in the Linux kernel on lb, edit the file /etc/sysctl.conf 
and add this line: 


net.ipv4.ip forward = 1 
Then load this setting into the kernel: 


# sysctl -p 
net.ipv4.ip forward = 1 


Idirectord 


Although we could obtain Idirectord on its own, we’ll get it as part of the Ultra Mon- 
key package, which includes the heartbeat software for HA. Because Ultra Monkey 
isn’t a part of the standard Debian distribution, you'll need to add these two lines to 
your Debian repository file (/etc/apt/sources.list) on the Ib machine: 


deb http: //www.ultramonkey.org/download/3/ sarge main 
deb-src http://www.ultramonkey.org/download/3 sarge main 


Then update the repository and get the package: 


# apt-get update 
# apt-get install ultramonkey 
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The installation process will ask you some questions: 


Do you want to automatically load IPVS rules on boot? 

No 

Select a daemon method. 

hone 
Our configuration will have one virtual server (the address that clients see, running 
Idirectord), which we'll call the director, and two realservers (running Apache). The 
realservers can be connected to the director in one of three ways: 


LVS-NAT 
The realservers are in a NAT subnet behind the director and route their 
responses back through the director. 

LVS-DR 
The realservers route their responses directly back to the client. All machines are 
on the same subnet and can find each other’s level-2 (Ethernet) addresses. They 
do not need to be pingable from outside their subnet. 

LVS-TUN 
The realservers can be on a different network from the director. They communi- 
cate by tunneling with IP-over-IP (IPIP) encapsulation. 


We're going to use DR, because it’s easy, it’s fast, and it scales well. With this 
method, we designate a VIP that is shared by the load balancer and the realservers. 
This causes an immediate problem: if all machines share the same VIP, how do we 
resolve the VIP to a single physical MAC address? This is called the ARP problem, 
because systems on the same LAN use the Address Resolution Protocol (ARP) to find 
each other, and ARP expects each system to have a unique IP address. 


Many solutions require kernel patches or modules, and change along with changes to 
the Linux kernel. In 2.6 and above, a popular solution is to let the load balancer han- 
dle the ARP for the VIP and, on the realservers, to configure the VIP on aliases of the 
loopback device. The reason is that loopback devices do not respond to ARP 
requests. 


That’s the approach we’ll take. We’ll configure the web servers first. 


Configuring the Realservers (Apache Nodes) 
On each realserver (web1 and web2), do the following: 


1. If the server doesn’t already have Apache installed, install it: 
# apt-get install apache2 
If you haven’t installed the content files for your web site, you can do it now or 
after load balancing is set up. 
2. Install iproute (a Linux networking package with more features than older utili- 
ties such as ifconfig and route): 
# apt-get install iproute 
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10. 


. Add these lines to /etc/sysctl.conf: 


net.ipv4.conf.all.arp_ignore = 1 
net.ipv4.conf.ethoO.arp_ignore = 1 
net.ipv4.conf.all.arp announce = 2 
net.ipv4.conf.ethO.arp announce = 2 


. Get the changes into the kernel: 


# sysctl -p 
net.ipv4.conf.all.arp_ignore = 1 
net.ipv4.conf.etho.arp_ignore = 1 
net.ipv4.conf.all.arp announce = 2 
net.ipv4.conf.ethO.arp announce = 2 





. Assuming that your realserver is a Debian system, edit the /etc/network/interfaces 


file, associating the VIP (70.253.15.42) with the loopback alias 10:0: 


auto lo:0 

iface lo:0 inet static 
address 70.253.15.42 
netmask 255.255.255.255 
pre-up sysctl -p > /dev/null 


. Enable the loopback alias: 


# ifup 1lo:o 


. Create the file /var/www/Idirector.html with the contents: 


I'm alive! 


. On web1: 


# echo "I'm web1" > /var/www/which. html 


. On web2: 


# echo "I'm web2" > /var/www/which. html 
Start Apache, or restart it if it’s already running: 
# /etc/init.d/apache2 restart 


The Apache access logs should not yet show any activity, because lb is not talking to 
them yet. 


Configuring the Load Balancer 


On lb, create the load balancer configuration file, /etc/ha.d/directord.cf: 


checktimeout=10 

checkinterval=2 

autoreload=no 

logfile="localo" 

quiescent=no 

virtual=70.253.158.42:80 
real=70.253.158.41:80 gate 
real=70.253.158.45:80 gate 
service=http 
request="director. html" 
receive="I'm alive!" 
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scheduler=rr 

protocol=tcp 

checktype=negotiate 
If quiescent is yes, a faulty realserver gets a weight of 0 but remains in the LVS rout- 
ing table; we’ve set it to no, so dead servers will be removed from the pool. The 
weight of a server reflects its capacity relative to the other servers. For a simple LB 
scheme like ours, all live servers have a weight of 1 and dead ones have a weight of 0. 


If checktype is negotiate, the director will make an HTTP request to each of the 
realservers for the URL request, and see if its contents contain the string value for 
receive. If the value is check, only a quick TCP check will be done, and request and 
receive will be ignored. 


The system startup files in /etc for Idirectord should have already been created during 
the installation. Ultra Monkey also installed Heartbeat, which we aren’t using yet, so 
let’s disable it for now: 

# update-rc.d heartbeat remove 

update-rc.d: /etc/init.d/heartbeat exists during rc.d purge (use -f to force) 
The load balancer monitors the health of the web servers by regularly requesting the 
file we specified in Idirectord.cf (request="director.htmL"). 


Since this server will be responding to web requests at the VIP address (70.253.158.42), 
we'd better tell the server about it. Edit /etc/network/interfaces and add these lines to 
create the alias device etho:0: 


auto etho:0 

iface etho:0 inet static 
address 70.253.158.42 
netmask 255.255.255.248 

# These should have the same values as for etho: 

network ... 
broadcast ... 
gateway ... 


Now, fire up this new IP address: 
# ifup etho:0 
Finally, start your engines on Ib: 


# /etc/init.d/ldirectord start 
Starting ldirectord... success 


Testing the System 

Let’s check that the load balancer is running on Ib: 
# Idirectord ldirectord.cf status 

You should see something like this: 


ldirectord for /etc/ha.d/ldirectord.cf is running with pid: 
1455 





Load Balancing and High Availability | 159 


If you see something like this instead: 
ldirectord is stopped for /etc/ha.d/ldirectord.cf 


there’s some problem. You can stop the director and restart it with the debug flag -d, 
and see whether any errors appear in the output: 


# /usr/sbin/ldirectord /etc/ha.d/ldirectord.cf stop 

# /usr/sbin/ldirectord -d /etc/ha.d/ldirectord.cf start 

DEBUG2: Running exec(/usr/sbin/ldirectord -d /etc/ha.d/ldirectord.cf start) 
Running exec(/usr/sbin/ldirectord -d /etc/ha.d/ldirectord.cf start) 
DEBUG2: Starting Linux Director v1.77.2.32 with pid: 12984 

Starting Linux Director v1.77.2.32 with pid: 12984 

DEBUG2: Running system(/sbin/ipvsadm -A -t 70.253.158.42:80 -s rr ) 
Running system(/sbin/ipvsadm -A -t 70.253.158.42:80 -s rr ) 

DEBUG2: Added virtual server: 70.253.158.42:80 

Added virtual server: 70.253.158.42:80 

DEBUG2: Disabled server=70.253.158.45 

DEBUG2: Disabled server=70.253.158.41 

DEBUG2: Checking negotiate: real 
server=negotiate:http: tcp: 70.253.158.41:80:::\/director\.html:I\'m\ alive\! 
virtual=tcp:70.253.158.42:80) 

DEBUG2: check_http: url="http://70.253.158.41:80/director.html" 
virtualhost="70.253.158.41" 

LWP: :UserAgent::new: () 

LWP: :UserAgent::request: () 

LWP: :UserAgent::send_ request: GET http://70.253.158.41:80/director. html 
LWP: :UserAgent::_need_proxy: Not proxied 

LWP: :Protocol: :http::request: () 

LWP: :Protocol::collect: read 11 bytes 

LWP: :UserAgent::request: Simple response: OK 

45:80/director.html is up 








The output is shorter if checktype is check. 


Just to be nosy, we’ll see what the lower-level IP virtual server says: 


# ipvsadm -L -n 
IP Virtual Server version 1.2.0 (size=4096) 
Prot LocalAddress:Port Scheduler Flags 


-> RemoteAddress:Port Forward Weight ActiveConn InActConn 
TCP 70.253.158.42:80 rr 

-> 70.253.158.45:80 Route 1 1 2 

-> 70.253.158.41:80 Route 1 0 3 


This shows that our first realserver is active, but the second is not. 
We'll also check the system logs on Ib: 


# tail /var/log/syslog 

Sep 11 22:59:45 mail ldirectord[8543]: Added virtual server: 
70.253.158.44:80 

Sep 11 22:59:45 mail ldirectord[8543]: Added fallback server: 127.0.0.1:80 
( x 70.253.158.44:80) (Weight set to 1) 

Sep 11 22:59:45 mail ldirectord[8543]: Added real server: 70.253.158.41:80 
( x 70.253.158.44:80) (Weight set to 1) 
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Sep 11 22:59:45 mail ldirectord[8543]: Deleted fallback server: 127.0.0.1:80 

( x 70.253.158.44:80) 

Sep 11 22:59:46 mail ldirectord[8543]: Added real server: 70.253.158.45:80 

( x 70.253.158.44:80) (Weight set to 1) 
Back on web1 and web2, check the Apache access logs. The director should demand 
director.html every checkinterval seconds: 


70.253.158.44 - - [11/Sep/2006:22:49:37 -0500] "GET /director.html HTTP/1.1" 


200 11 "-" "Libwww-perl/5.803" 
70.253.158.44 - - [11/Sep/2006:22:49:39 -0500] "GET /director.html HTTP/1.1" 
200 11 "-" "Libwww-perl/5.803" 


In your browser, go to the virtual site URL hitp://70.253.158.42/which.html, and you 
should see either: 


I'm web1 
or: 
I'm web2 


If the load balancer is broken or one of the web servers is down, you might always 
get a response from the same web server. 


Now, stop Apache on web1: 
# /etc/init.d/apache stop 


Reload/refresh your browser page to access http://70.253.158.42/which.html again. 
You should always get the response: 


I'm web2 


Adding HA to LB 


The load balancer is a single point of failure. If it starts pining for the fjords, the web 
servers behind it will become inaccessible. To make the system more reliable, you 
can install a second load balancer in an HA configuration with the first. Detailed 
instructions, which use the Ultra Monkey package that you’ve already installed, can 
be found in “How To Set Up A Loadbalanced High-Availability Apache Cluster,” 
(http://www.howtoforge.com/high_availability_loadbalanced_apache_cluster). 


You may not need HA for the Apache servers themselves, because Idirectord is 
already prodding them every checkinterval seconds for status and adjusting weights, 
which is similar in effect to the heartbeat of HA. 


Adding Other LB Services 


We've used Apache web servers as this chapter’s example because they’re the most 
likely to be part of a server farm. Other services that could benefit from LB/HA include 
MySQL, email servers, or LDAP servers. See “How To Set Up A Load-Balanced 
MySQL Cluster” (http://www.howtoforge.com/loadbalanced_mysql_cluster_debian) for 
a MySQL example. 
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Scaling Without LB and HA 


If you offered a wonderful service, would your server survive a Slashdotting (i.e., a 
huge activity spike)? If not, your credibility could suffer and many visitors might 
never return. But because implementing LB and HA requires significant effort and 
hardware investments, it’s worth considering other solutions. There are ways to get 
more from your present server. For instance, you can disable .htaccess files in your 
Apache configuration (AllowOverride None), and use mod_expires to avoid stat calls 
for infrequently changed files such as images. Apache books and web sites contain 
many such optimization tips. 


Once you reach the limits of your web server software, consider alternatives. In many 
cases, web servers such as lighttpd (http://www.lighttpd.net), Zeus (http://www. 
zeustech.net), and litespeed (http:/Nitespeedtech.com) are faster than Apache and use 
less memory. 


You can also get huge boosts from caching. Code caches, which include PHP acceler- 
ators such as e-accelerator (http://eaccelerator.net) and APC (http://apc. 
communityconnect.com), save PHP bytecode and avoid parsing overhead on each 
page access. Data caches such as MySQL’s query cache save the results of identical 
queries. Replication is a form of LB. memcached (http://danga.com/memcached) is a 
fast way to cache data such as database lookup results. Squid (hitp://www.squid- 
cache.org), when used as a caching reverse proxy, is a page cache that can bypass the 
web server entirely. 


When servers are in separate tiers (e.g., MySQL > PHP > Apache), improvements 
are multiplicative; for example, the presentation “Getting Rich with PHP 5” (http:// 
talks.php.net/show/oscon06) combines many small fixes to scale a PHP application 
from 17 calls/second to 1,100 calls/second on a single machine. 


If you’re already using these techniques and are still straining to meet demand, defi- 
nitely try LB, and provide HA if stability is critical. 


Further Reading 


More details on the software used in this chapter are available via the products’ web 
pages: 

e The Linux Virtual Server Project (http://www.linuxvirtualserver.org) 

e Ultra Monkey (http:/;vww.ultramonkey.org) 

e Heartbeat/The High-Availability Linux Project (http://linux-ha.org) 
You may also want to check out the Red Hat Cluster Suite (http://www.redhat.com/ 


software/rha/cluster), a commercial LB/HA product for Linux built on LVS. The 
same software is freely available (but without support) in CentOS. 
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CHAPTER 8 
Local Network Services 














In this chapter we’ll look at some skills a system administrator needs to manage a 
host behind the firewall or gateway of a company, an organization, or even a home 
network. 


Some of us prefer reading about developments in Internet technology rather than 
local area networks, which we think of as routine and unchallenging. But when we 
need to configure or fix something central to our working environment, local net- 
working moves up the value chain. For example, little else seems to matter when the 
CEO’s email doesn’t work. 


Local networking can take up the majority of a system administrator’s time if she 
isn’t smart about it. So, if you’ve just started in system administration, you’ll want a 
primer on LANs and how to install, configure, and maintain a number of different 
servers you'll find there. For the basics, take a look at the most recent edition of the 
Linux Network Administrator’s Guide Terry Dawson, et al (O’Reilly). As long as you 
possess basic Linux user skills, though, even without such background the topics in 
this chapter shouldn’t be over your head—and we find them exciting. 


In this chapter, we’ll explore distributed filesystems with a unique slant, how to set 
up DHCP and gateway services (including routing between a LAN and the Internet), 
the craziness of corporate printing, and user management. Local email services fit 
under the umbrella of LAN topics, too, but we covered those issues in Chapter 5. 


We'll use the Fedora Core Linux distribution for this chapter. Red Hat sponsors the 
Fedora project and typically uses it for testing its next stable enterprise release. Fedora 
is not the ultra-stable version of Red Hat Enterprise Linux, but it’s reasonably stable 
and robust. Red Hat provides native packages of many tools for Fedora, putting 
Fedora on the leading edge of free Linux distributions available for commercial use. 


Whether you like the Red Hat model or not, you can apply the material in this chap- 
ter to other distributions of Linux. We suggest you dig into this material: it’s fun, 
you'll need it in practically any environment you work in, and you won’t find the 
bulk of this material elsewhere. 
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Distributed Filesystems 


You may find it difficult to imagine a time when PCs simply stood alone without the 
benefit of a network or a connection to the Internet. But PCs were not originally 
designed with networking in mind. You may or may not remember when people 
transferred files by walking floppy disks from one PC to another, or flipped a switch 
so two to four users could share a printer. Those were painful times. 


After the introduction of the PC, it took a number of years and innovations to cre- 
ate such basic networking conveniences as distributed filesystems. Getting those 
filesystems working on PCs transformed the landscape of business, because it 
allowed us to put a computer on everyone’s desk. No longer did we have to manu- 
ally fill out forms for keypunch operators to funnel into batch mainframe systems. 


Networking became more available and affordable when an IBM researcher, Barry 
Feigenbaum, turned a local DOS filesystem into a distributed one. His efforts helped 
create the Server Message Block (SMB) application protocol, and the era of system 
administrators and network engineers began. 


Distributed filesystems let users open, read, and write files stored on computers 
other than their own. In some environments, a single large computer stores files 
accessed by all users on the LAN; the central computer can even store the users’ 
home directories, so that all their work is essentially stored on it. In other environ- 
ments, users store files on their PCs but allow others to access those files. The two 
environments can be mixed, too. Whatever the configuration, this practice is called 
file sharing, and the directories (folders, in PC lingo) that users can access on the 
remote machines are called shares. 


PCs became prevalent in businesses toward the end of the 1980s, and local area net- 
works came into existence as PC use evolved and people discovered the need to share 
resources. 


Try to imagine what the introduction of a LAN must have been like to a closely situ- 
ated group of PC users who had never had network services. Suddenly, coworkers 
could conveniently share documents, print to devices some distance away from their 
desks, and answer emails from supervisors located across the office, campus, or 
country. That opened a lot of people’s eyes. 


Today, many sites store their users’ critical files on central servers, which control 
users’ access rights to the files. We’ll discuss user management later in this chapter. 


Introduction to Samba 


SMB file and printer sharing evolved under Microsoft’s guidance into the Common 
Internet File System (CIFS) protocol. CIFS has been published as a standard, but it’s 
poorly documented and contains lots of secret behaviors that Microsoft continues to 
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evolve. However, an intrepid team of developers keeps reverse engineering the proto- 
col, and it has created one of the most popular free-software projects to implement 
Microsoft file sharing on non-Microsoft systems: Samba. Samba is increasingly popu- 
lar; it has significant support for Windows and Linux desktops and is even used on 
Mac OS X. 


As a Linux system administrator, you will need at least a high-level understanding of 
Samba. If you wish to drill down deeper into Samba (and you should), many excel- 
lent books exist on the subject, including the online documentation guides at http:// 
samba.org. To use a common phrase, “An in-depth discussion of this topic goes far 
beyond the scope of this book.” Actually, we don’t see any reason to duplicate the 
excellent material already available. However, we do want to discuss Samba in 
enough detail for you to make it functional in your environment. Luckily, most dis- 
tributions provide simple, graphical frontends to Samba, and we’ll discuss some of 
those here. 


Certain central functions in CIFS networks (mostly involving the way systems find 
each other) take place on domain controllers: servers that provide files, printers, and 
various controlling operations. Samba can integrate Linux machines into Microsoft 
networks as file and print servers, domain controllers, or workgroup members. 


The latest iteration of Samba interoperates with Microsoft’s Active Directory. Samba 
combined with LDAP can also function as a robust authentication server, replacing 
both Microsoft NT domain controllers and Active Directory servers. 


Samba can also play a file-sharing role in simpler environments where members of 
small offices and/or departments of larger organizations use peer-to-peer networking. 
Desktop users can share their printers and files with others without those others hav- 
ing to authenticate. If sensitive functions such as financial accounting and record 
keeping are handled on one machine, stronger machine-level security policies can be 
implemented to shield that machine from other users without compromising its abil- 
ity to access the resources of the peer-to-peer network. 


Now, let’s take a look at a Linux/Windows network and see how you can set up 
Samba for your desktop users. 


Configuring the Network 


Figure 8-1 represents a network as it might be seen from a Linux system (the Xan- 
dros distribution, which is a convenient desktop Linux suitable for corporate envi- 
ronments). The tree view on the left side of the screen shows four computers named 
Athlon, Atlanta, Dallas, and Dell. Dallas offers a printer, along with several directo- 
ries, to the other systems; Dell also hosts a printer. One of the other computers runs 
Windows XP, and the other two run Windows 98. Linux ties them all together. The 
Linux system looks the same as a Windows system when viewed from the Network 
Neighborhood or My Network Places on one of those systems. 
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Figure 8-1. Files and directories shared by Linux system, as viewed from a Windows PC 


The right side of the screen in Figure 8-1 highlights the shared documents folders on 
the node called Dallas, which is a Windows XP system. You also can see a word pro- 
cessor file named xp_network_setup.sxw, which was saved in a native OpenOffice. 
org Writer format (Version 1). 


How difficult was it to set up this network? Aside from the standard wiring, Ether- 
net connections, and installation of the firewall and modem, the system basically 
installed itself. We followed standard setup procedures on both Windows 98 
machines. The systems used DHCP to obtain their IP addresses, DNS servers, and 
routes to a gateway. The router provided DHCP services and a private Internet 
address scheme using a Class C network (192.168.0.0 through 192.168.0.255). 
(We’ll discuss DHCP in the next section.) 


Once the Windows systems established their network configurations and could 
reach the Internet, we right-clicked Network Neighborhood, selected Properties, and 
changed the dynamic addresses to static ones. This allowed the workstations to act 
as print servers and provide shared access to the Internet. 


Setting up the Windows XP systems was slightly more complicated, because at first 
the XP and (now unsupported) Windows 98 machines didn’t see each other. To 
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make them aware of each other, we had to enable Simple File Sharing via XP’s con- 
trol panel and run the Network Setup Wizard. The wizard asked us if we wanted to 
enable sharing on other computers, referring to Windows 98 machines. Answering 
yes enabled us to create a floppy disk that we could use to install the XP protocols on 
the Windows 98 computers. This process upgraded the older systems to the newer 
protocols, enabling the XP and Windows 98 boxes to communicate. (The program 
furnished by Microsoft is called netsetup.exe.) 


We then installed the Xandros Linux desktop and enabled Windows Networking on 
it, as shown in Figure 8-2. 
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Figure 8-2. Configuring Windows Networking 


Notice that we were able to configure Windows Networking via a dialog box. The 
Linux desktop allowed us to enable file and printer sharing, name the computer, 
define the workgroup, and enable share-level security, which allows the Windows 
nodes to use CIFS functionality. 


Other Linux distributions, such as Fedora and Ubuntu, also offer easy tools for set- 
ting up Windows file sharing. Figure 8-3 shows two configuration screens for the 
Ubuntu desktop. 


Ubuntu also gives you the option of setting up the Network File System (NFS), a 
popular Unix-to-Unix file-sharing system that is incompatible with CIFS. The dialog 
box in Figure 8-4 lets you choose either or both systems; you can use Samba to inter- 
operate with Windows and Mac OS X, while using NFS to interoperate with other 
Unix/Linux systems. Sharing services in Ubuntu are not installed by default, but if 
you select Shared Folders (under the Administration menu in Ubuntu 6.10), Ubuntu 
downloads the necessary files; you’re then ready to become a member of the domain 
or workgroup. 
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Figure 8-3. Setting up Ubuntu shares in a Windows environment 
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Figure 8-4. Ubuntu’s setup screen for file-sharing services 


We'll dig deeper into Samba issues in the section“Print Services” later in this chapter, 


DHCP 


Dynamic Host Configuration Protocol (DHCP) services can help you with a number 
of problems associated with local network environments, including IP address 
assignment problems and administration issues. It’s difficult to imagine a network 
without DHCP. 


Let’s look at some issues you may face, and consider how DHCP can help: 


e PCs and workstations require unique IP addresses, DNS information, and the 
locations of gateways. 


e Manually tracking IP addresses causes excessive work. 


e Accidental duplication of IP addresses creates conflicts on the network. 





168 | Chapter 8: Local Network Services 


e Troubleshooting address problems (such as duplicate addresses) and changes in 
location creates unnecessary work. 


e Changes in personnel usually mean that someone will have to check each com- 
puter to configure a new database of IP assignments. 


e Frequent movement of mobile users creates a need to reconfigure networking on 
laptops. 


DHCP solves these problems by handing out IP addresses as needed to each system 
on a LAN when those systems boot up. The DHCP server ensures that all IP 
addresses are unique. The service requires little human involvement in the assign- 
ment and maintenance of IP addresses. Administrators can write the configuration 
files and leave the rest up to the DHCP server (dhcpd). This server manages the IP 
address pool, freeing a human network administrator from that task. 


Installing DHCP 


To get started with DHCP, you first need to install the DHCP server. Because this 
chapter focuses on Fedora, you can install the RPM package with Yum or the pack- 
age manager /usr/bin/gnome-app-install; the current version of the package is dhcp-3.0. 
3-28.i386. (Debian users can install the dhcp3-server package and edit the configura- 
tion file /etc/dhcp3/dhcpd.conf). The software originates from the Internet Systems 
Consortium. 


Once you’ve installed it, configure DHCP in /etc/dhcpd.conf. As a first step, copy the 
file /usr/share/doc/dhcp/dhcpd.conf.sample to /etc/dhcpd.conf. Next, edit the file to fit 
your network. The following example is typical. The syntax uses pound signs (#) for 
comments: 


ddns-update-style interim; 
ignore client-updates; 


subnet 192.168.1.0 netmask 255.255.255.0 { 


# --- default gateway 
option routers 192.168.1.1; 
option subnet-mask 255.255.255.0; 


--- option nis-domain "domain.org"; 
--- option domain-name "domain.org"; 
option domain-name-servers 192.168.1.1; 


--- option time-offset -18000; # Eastern Standard Time 

option ntp-servers 192.168.1.1; 

option netbios-name-servers 192.168.1.1; 
--- Selects point-to-point node (default is hybrid). Don't change this 
-- unless you understand Netbios very well 

option netbios-node-type 2; 
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# --- range dynamic-bootp 192.168.0.128 192.168.0.254; 
default-lease-time 21600; 
max-lease-time 43200; 


# we want the nameserver to appear at a fixed address 
host ns { 
next-server serveri.centralsoft.org; 
hardware ethernet 00:16:3E:63:C7:76; 
fixed-address 70.253.158.42; 


} 


We configured a few items in our configuration file after we copied it to the /etc 
directory: 
subnet 192.168.1.0 netmask 255.255.255.0 { 

option routers 192.168.1.1; 

option domain-name-servers 192.168.1.1; 

option subnet-mask 255.255.255.0; 

default-lease-time 21600; 

max-lease-time 43200; 
The first line sets the range or pool of IP addresses available for the users in the subnet 
of the LAN. In this case we used the reserved private Class C network 192.168.1.0, 
which provides 254 nodes (192.168.1.1 through 192.168.1.254). This netmask must 
match the netmask used to define your LAN. 


We specified the gateway address in the second line, option routers, and a caching 
nameserver in the third line, option domain-name-servers. The IP address is the same 
on both lines, which reflects common practice. 


A single server with two network cards often acts as a gateway in a local area net- 
work. One card, represented by a device name such as eth0, has an address on the 
Internet, while the other card (say, eth1) has an address on the private network. 


When packet forwarding and iptables firewalling are enabled, any Linux server can 
act as a gateway/firewall. In this case, we also enabled BIND in caching mode to 
function as the network’s DNS server. 


The last two lines specify the amount of time a client can keep the address, mea- 
sured in seconds. 


In our DHCP configuration file, we also added a clause to specify a static address for 
a corporate DNS server: 


# we want the nameserver to appear at a fixed address 
host ns { 
next-server serveri.centralsoft.org; 
hardware ethernet 00:16:3E:63:C7:76; 
fixed-address 70.253.158.42; } 
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In the upcoming section “Assigning IPv6 Addresses with radvd” we'll discuss how to 
use dhcpd to hand out static IP addresses based on the MAC address of a client’s net- 
work card. But before we do, let’s look at a simple version of /etc/dhcpd.conf: 


ddns-update-style interim; 


default-lease-time 600; 
max-lease-time 7200; 


subnet 192.168.1.0 netmask 255.255.255.0 { 
option routers 192.168.1.1; 
option subnet-mask 255.255.255.0; 
option domain-name-servers server.centralsoft.org, 
server2.centralsoft.org; 
range 192.168.1.2 192.168.1.254; 


For simple DHCP servers, maintenance may actually be easier if you 
omit comments and keep the configuration file short. 
N 








Starting Your DHCP Service 


Some DHCP services require a dhcpd.leases file. Use the touch command to create an 
empty file in the same directory as the dhcpd.conf file: 


# touch /var/lib/dhcp/dhcpd. leases 


You'll want to start your DHCP server now, to check whether the configuration is 
correct. You'll also want to configure the server to start on boot. To accomplish the 
first task, enter: 

[root@host2 ~]# service dhcpd start 

Starting dhcpd: [ OK ] 

[root@host2 ~]# 
You can also test whether the DHCP process is running with the following com- 
mand (if the service is running, a line will be displayed with the process’s statistics): 


# ps aux | grep dhcpd 
root 9028 0.0 0.0 2552 636 Ss 09:40 0:00 /usr/sbin/dhcpd 


Use the chkconfig command to get DHCP to start at boot time: 


# chkconfig dhcpd on 

# chkconfig --list 

.... from the list: 

dhcpd O:off 1:off 2:on 3:0n 4:on 5:on 6:off 
As with other services under Linux, you’ll need to restart the DHCP daemon when- 
ever you make changes to your configuration files. You can set other options in the 
dhcpd.conf file globally or for a client machine or subnet. This means you can estab- 
lish useful defaults for your network, then override them for a certain group of 
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machines or even individual machines. Here’s an example of a global configuration 
section at the top of a dhcpd.conf file: 


option domain name "host2.centralsoft.org"; 


Providing Static IP Addresses 


Workstations usually function fine with dynamic addresses (that is, addresses that 
can change periodically or upon reboot), but servers usually benefit from static 
addresses so that their addresses don’t change while they’re in the middle of a ses- 
sion with a client. Thus, DHCP lets you specify static IP addresses for particular sys- 
tems in dhcpd.conf. Let’s do this in steps. 


First, set up the subnet, broadcast address, and routers: 


subnet 192.168.1.0 netmask 255.255.255.0 

option broadcast-address 192.168.1.255; 

option routers 192.168.1.1; 
Next, add a host section for each machine on your network. To do this, you need to 
know the hardware address (often called the MAC address) for each network card, 
which you can determine by using the ifconfig command on the host. Here’s an 
example host section: 


# ethernet MAC address as follows (Host's name is “laser-printer"): 


host laser-printer { 
hardware ethernet 08:00:2b:4c:59:23; 
fixed-address 192.168.1.10; 


} 


host1.centralsoft.com { 
hardware ethernet 01:0:c0:2d:8c:33; 
fixed-address 192.168.1.5; 


} 
Create a configuration clause like this for each server needing a static IP address, and 
add it to the configuration file. 


Assigning IPv6 Addresses with radvd 


Back in 1995, Steve Deering and Robert Hinden realized the need for a new Internet 
Protocol addressing system. Their first specification for IPv6 appeared in 1995, in 
IETF Request for Comments (RFC) 1883; the second in appeared in 1998, in RFC 
2460. Deering and Hinden articulated what many people already knew: that |Pv4’s 
32-bit address space would limit the explosive growth of the Internet. 


Few system administrators realize that IPv6 and its new methods for assigning IP 
addresses have started gaining in popularity. Although many people scoff at IPv6, 
saying either that it is unnecessary or that the weight of existing practice will prevent 
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it from ever entering the mainstream, enough applications and environments require 
it that the tide is turning in its direction. 





Ve 

SO An extensive discussion of IPv6 is, again, out of this book’s scope; for 
a more information on the IPv6 protocol and daemon, as well as on 
`~ 412 obtaining public IPv6 addresses, you’ll have to look elsewhere. 





IPv6 addresses often include the hardware addresses of network cards. This property 
allows IPv6 users to obtain static IP addresses without requiring any configuration 
on the server side to support those addresses. Automatic assignment of IPv6 
addresses can be done with the help of the router-advertising daemon radvd. 


Fedora users can install the radvd-0.9.1 package from their Yum repositories. 
Debian users can install the radvd package and read the file /usr/share/doc/radvd/ 
README. Debian. 


radvd listens to router solicitations and sends router advertisements as described in 
RFC 2461, “Neighbor Discovery for IP Version 6 (IPv6).” Hosts can automatically con- 
figure their addresses and choose their default routers based on these advertisements. 


radvd supports a simple protocol. You’ll also find its configuration simple. An exam- 
ple of a fully configured /etc/radvd.conf file looks like this: 


interface ethO 


{ 
AdvSendAdvert on; 
prefix 0:70:1f00:96::/64 
{ 
J; 
J; 


If you wish to use radvd, you’ll need to change the prefix to the one for your net- 
work and set up the service. You will also need to configure DNS on your client 
workstations separately. 


You can find the radvd project home page at http://www.litech.org/radvd. 


Gateway Services 


Linux has facilities for LAN users to browse the Internet without exposing their indi- 
vidual IP addresses to the public. The typical setup hides activities inside an organi- 
zation from the public by using Linux as a router. On the private side of the router, 
local activities go undetected by anyone on the public side. 


People sometimes also refer to a gateway as a bastion host. You might think of it as a 
network entity that provides a single entrance and exit point to the Internet. Bastion 
hosts help prevent the cracking of a network by providing a barrier between private 
and public areas. We refer to the services they provide as gateway services. 
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Linux system administrators implement gateway services by using a combination of 
packet forwarding and firewall rules known as iptables. You might also see other 
names for gateway services, such as masquerading or Network Address Translation 
(NAT). 


In small organizations and home networks, a gateway can exist on a single server and 
include basic security, a firewall, and DHCP, caching DNS, and mail services. In 
larger organizations, such services are generally spread across several servers, with a 
demilitarized zone (DMZ) isolating the gateway. 





Role of a DMZ 


In computer security, the term demilitarized zone refers to a perimeter network, which 
is a subnet or network that sits between an internal network and the Internet. For 
example, your private network might use an internal network of 192.168.1.0, the DMZ 
10.0.0.0, and the public Internet block 70.253.158.0. 


DMZs are used to contain servers that need to be accessible from the outside world, 
such as email, web, and DNS servers. Connections from the Internet to the DMZ are 
usually controlled using Port Address Translation (PAT). 


The source and destination for every IP packet contain an IP address and a port. Port 
translation makes changes to both the sender’s and recipient’s addresses on data pack- 
ets. Port numbers, not IP addresses, are used to designate different computers on the 
inside network. 


A DMZ typically sits in the middle of two gateways or firewalls and connects to both, 
with one network interface card connected to the internal network and the other to the 
Internet. A DMZ can prevent accidental misconfiguration that would allow access 
from the Internet to the internal network. We call this a screened-subnet firewall. 











For our purposes, we’ll limit the gateway configuration to packet forwarding; we 
won’t spend time on a DMZ, which requires more equipment and effort. To build a 
gateway, you need: 

e A dedicated computer to act as the gateway 

e A connection to the Internet and two network cards 

e A small switch for client machines to connect to the gateway 

e iptables installed 
We'll assume that eth0 is your Internet connection and eth] is your internal gateway 


in this configuration. Edit the configuration file for eth0, which is in /etc/sysconfig/ 
networking/devices/ifcfg-eth0, to include the following lines: 
ONBOOT=yes 


USERCTL=no 
IPV6INIT=no 
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PEERDNS=yes 
GATEWAY=70.253.158.46 
TYPE=Ethernet 
DEVICE=etho 
HWADDR=00:04:61:43:75:ee 
BOOTPROTO=none 
NETMASK=255.255.255.248 
IPADDR=70.253.158.43 


Similarly, the configuration for eth1 should look like this: 


ONBOOT=yes 

USERCTL=no 

IPV6INIT=no 

PEERDNS=yes 
TYPE=Ethernet 
DEVICE=eth1 
HWADDR=00: 13 :46:e6:e5:83 
BOOTPROTO=none 
ETMASK=255.255.255.0 
IPADDR=192.168.1.1 








Information on these configuration parameters can be found in the file sysconfig.txt, 
which you'll find in a directory with a name similar to /usr/share/doc/initscripts-7.93.7. 


With your network cards configured, you need to make sure you’ve installed ipta- 
bles. You should see the following result: 
[root@host2 devices]# rpm -q iptables 


iptables-1.3.5-1.2 
[root@host2 devices ]# 


If you don’t have iptables installed, install it now and load the modules. 


Fedora 5 will install iptables using the Add/Remove Software applica- 

as tion, located directly above the Applications menu on the GNOME 
N . A 

kè panel. It also loads the kernel modules as part of the installation process. 








Then run: 


# iptables -t nat -A POSTROUTING -o etho -j MASQUERADE 
# service iptables save 
# echo 1 > /proc/sys/net/ipv4/ip_forward 


Now edit /etc/sysctl.conf, changing net.ipv4.ip_forward = 0 to 1 to keep this 
enabled at reboot. You make the system re-read /etc/sysctl.conf by typing: 


# sysctl -p 


Finally, if you have a small organization, you can add DHCP to the server using a 
simple version of dhcpd.conf: 


ddns-update-style interim; 


default-lease-time 600; 
max-lease-time 7200; 
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subnet 192.168.1.0 netmask 255.255.255.0 { 
option routers 192.168.1.1; 
option subnet-mask 255.255.255.0; 
option domain-name-servers server1.centralsoft.org, 
server2.centralsoft.org; 
range 192.168.1.2 192.168.100.254; 


Another Approach to Gateway Services 


This section covers the use of packaged gateway and firewall combination products 
with multiple feature sets. Several free packages exist, such as Firestarter, IPCop, 
Netfilter, and Shorewall. You will see Smoothwall and ClarkConnect mentioned in 
Linux literature, but these are commercial products that install an entire Linux distri- 
bution, not standalone applications. 


For use in this chapter, we chose Firestarter. However, you may want to take a look 
at Shorewall, a configuration utility for Netfilter (a command-line tool). 


You can download Firestarter from the Fedora repositories. Our installation had the 
following package: 
[root@host2 ~]# rpm -q firestarter 


firestarter-1.0.3-11.fc5 
[root@host2 ~]# 


The Firestarter Firewall Wizard (Figure 8-5) launches when an administrator starts 
the program the first time. You can relaunch the wizard from the Firewall menu in 
the main interface, as well as change the choices through the Preferences option. 








a > 


r Firewall Wizard x 


AREesTARTER Welcome to Firestarter 


This wizard will help you to set up a firewall for your 
Linux machine. You will be asked some questions 
about your network setup in order to customize the 
firewall for your system. 


Tip: If you are uncertain of how to answer a question it is 
best to use the default value supplied. 


Please press the forward button to continue. 








H Quit 











Figure 8-5. The Firestarter Firewall Wizard 
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After the initial splash screen there will be a series of configuration screens, starting 
with the Network device setup screen (Figure 8-6), which can setup dual network 
cards. 








A 


a Firewall Wizard 


@restarrer Network device setup 


Please select your Internet connected network device from the drop-down 
list of available devices. 








Detected device(s): | Ethernet device (eth) 





Tip: If you use a modem the device name is likely pppO. If you have a cable modem ora 
DSL connection, choose eth0. Choose pppo if you know your cable or DSL operator uses 
the PPPoE protocol. 


C] Start the firewall on dial-out 
C] IP address is assigned via DHCP 








| <a Back > Forward i ry 


Figure 8-6. The Network device setup screen 














Firestarter refers to its primary function as connection sharing. However, since it uses 
NAT it functions as a gateway, so client PCs on an internal LAN look like a single 
machine with a single IP address to the Internet. This becomes evident, for example, 
in the preferences screen shown in Figure 8-7. Notice that the first device descrip- 
tion refers to the “Internet connected network device” and the second description 
refers to the “local network connected device.” 


You can also see toward the bottom of Figure 8-7 that Firestarter allows the adminis- 
trator to use an existing DHCP configuration or create a new one. Here’s Fire- 
starter’s dhcpd.conf file: 


# DHCP configuration generated by Firestarter 
ddns-update-style interim; 
ignore client-updates; 


subnet 192.168.1.0 netmask 255.255.255.0 { 
option routers 192.168.1.1; 
option subnet-mask 255.255.255.0; 
option domain-name-servers 70.253.158.42, 70.253.158.45, 151.164.1.8; 
option ip-forwarding off; 
range dynamic-bootp 192.168.1.10 192.168.1.254; 
default-lease-time 21600; 
max-lease-time 43200; 
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F Preferences ER 








V Interface Network Settings 
Events Internet connected network device 
Policy 


Detected device(s): | Ethernet device (eth0) $ ] 





Vv Firewall 


Local network connected device 
Network Settings 


ICMP Filtering Detected device(s): | Ethernet device (eth1) $ | 


ToS Filtering Enable Internet connection sharing 








Advanced Options Enable DHCP for the local network 
V DHCP server details 
O Keep existing DHCP configuration 


®© Create new DHCP configuration 


Lowest IP address to assign: | 192.168.1.100 
Highest IP address to assign: 192.168.1|2 54 
Name server: <dynamic> 


X Cancel 4P Accept 


Figure 8-7. Firestarter Preferences screen 
































The resolv.conf file on the gateway shows up on DHCP client machine configura- 
tion settings as Firestarter reads that file and places the DNS server addresses in 


dhcpd.conf. 


The main interface of Firestarter provides a view of the gateway’s status and connec- 
tions to DHCP hosts. It also provides a summary of events and activity, as shown in 
Figure 8-8. 


In Figure 8-9, you can see a view of events from the second tab of the main interface. 
In this view, you can see the blocked connections. 


The Events panel provides a log of attempts to exploit the firewall. You might find it 
useful when intruders attempt to break into your systems. If they seem to persist, 
add their IP addresses to the /etc/hosts.deny file. If someone attempts to enter 
through ssh’s port 22 using a dictionary attack, you can simply close the port with 
Firestarter. 


The Firestarter icon turns red when it sees a potential exploit in the making. Notice 
the message above it in Figure 8-10: “Hit from 221.237.38.68 detected.” That’s 
worth investigating. 


The third tab on the main interface allows you to set policies for services you will or 
will not allow. For example, we allow SSH connections into the firewall from the 
outside, so we set a policy to allow SSH on port 22. 
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Firestarter hostz2.centralsoft.org 





Firewall Edit Events Policy Help 


x a 5 


Preferences Lock Firewall Stop Firewall 


Status | Events | Policy 








Firewall 
Status Events 
>) Total Serious 
Inbound 48 10 
Attive Outbound 67 0 
Network 
Device Type Received Sent Activity 
etho Internet 399.3 MB 113MB 0.0 KB/s 
ethl Local 0.3 MB 3.6MB 0.0 KB/s 
sit0 IPv6 Tunnel 0.0 MB 0.0 MB 0.0 KB/s 





Active connections. 








‘Source Destination Port Service Program 
127.0.0.1 127.0.0.1 56803 Unknown 
70.129.128.185 70.253.158.43 22 SSH 
192.168.1.254 70.253.158.43 22 SSH 
70.253.158.43 64.233.187.99 80 HTTP firefox-k 
70.253.158.43 64.233.167.99 80 HTTP firefox-k 











U N») 




















Figure 8-8. Firestarter’s main interface 


Firestarter uses a wizard to configure gateway policies. You can get a glimpse of how 
this works in Figure 8-11. 


Figure 8-11 shows a window named “Add new inbound rule.” This screen appears 
after you select Add Rule on the Policy tab. In this window, you can see a selection of 
options you can use to allow services into the network. A similiar screen eixts for 
outbound services you provide your users. 


You will find Firestarter an easy application to configure. The project community has 
done an outstanding job of documenting the procedures in a well-written and suc- 
cinct user guide, which you can find at http://fs-security.com/docs.php. 
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3 Firestarter host2.centralsoft.org meem eg 
Firewall Edit Events Policy Help 

-— 

= w © 


Save List Clear Reload 
























| Out Port | Length Protocol Service [a] 





Sep 2 15:05:30 ethO 22 60 TCP SSH 
Sep 2 15:12:57 eth0 123 76 UDP NTP 
Sep 2 15:13:05 ethO 123 76 UDP NTP 
Sep 2 15:13:11 ethO 123 76 UDP NTP 
Sep 2 15:20:22 80 48 TCP HTTP 
Sep 2 15:21:42 eth0 443 44 TCP HTTPS 
Sep 2 15:30:01 ethO 123 76 UDP NTP 
Sep 2 15:30:11 ethO 123 76 UDP NTP 
Sep 2 15:30:16 ethO 123 76 UDP NTP 
Sep 2 15:47:04 ethO 123 76 UDP NTP 
Sep 2 15:47:16 ethO 123 76 UDP NTP 
Sep 2 15:47:22 ethO 123 76 UDP NTP 
Sep 2 15:55:44 80 48 TCP HTTP 
Sep 2 16:01:41 80 48 FGP HTTP 
Sep 2 16:04:09 ethO 123 76 UDP NTP 
Sep 2 16:04:19 ethO 123 76 UDP NTP 
Sep 2 16:04:25 ethO 123 76 UDP NTP i+ 








Figure 8-9. Firestarter’s Events panel 





[Hit from 221.237.38.68 detected fel ae em 3 


: Thu Aug 31, 11:12:12 AM @) 


anini 











Figure 8-10. Panel icons showing an attempted intrusion 





Ws 

oa At this point, you may be wondering why we’ve included an applica- 
a tion dependent on the GNOME desktop. Recall that when we chose 
`~ 4l* Fedora as the distribution for local networking, we did so because of 





its extensive tool set. Adding Firestarter fits into our philosophy with- 


out removing our ability to use the command-line interface. 
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Allow service 
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Figure 8-11. Policy configurations for Firestarter 


Print Services 


As a Linux system administrator, printers can cause you serious headaches. You’re 
bound to find hardware, software, and operating system incompatibilities. Because 
such a wide variety of systems and methods of configuring printers exists, this area of 
administration has the potential to put you in a bad mood for months—or at least 
until you get a handle on the situation. 


Let’s start with hardware. Most system administrators will discover four types of 
hardware for networking printers. In existing networks, you may find any combina- 
tion of these configured: 


e Printers attached to users’ PCs 


e Dedicated PCs used as printer servers 





Print Services | 181 


e Network-enabled printers with built-in Ethernet cards 


e Printer server devices connecting printers directly to a LAN 


In most medium-sized office buildings, you’ll probably see several of these solutions 
in use every time you turn a corner. The flexibility provided by modern desktop sys- 
tems often causes problems. 


Let’s say that one of your users, Sally Jean, buys an inkjet printer, goes down to the 
petty cash window and gets reimbursed for it, then connects it directly to her PC. 
Billy Bob, who’s seated at the desk next to her, then asks if he can use her printer. So, 
she right clicks the printer on her desktop and selects “Share.” Billy Bob tries to con- 
nect to Sally’s printer, but it doesn’t work. Why? He doesn’t have the driver 
installed. 


So, these two users call the system administrator (that’s you) to come fix the prob- 
lem. You install the driver on Billy Bob’s PC, and suddenly, just like magic, it works. 
Later, Sally Jean calls and complains that her PC needs more memory and a faster 
processor. Why? Ten people are now using her printer because she has an open 
share, and it’s slowing her down. 


When you check out the situation, you see that just around the corner a large-volume 
laser printer with a print direct card is sitting idle. Why aren’t all those users printing 
to that printer? As it turns out, it doesn’t show up on the network because no one’s 
bothered to add it to the domain controller. 


What this hypothetical anecdote shows is that you, as a system administrator, need 
to prepare a strategy for managing your printer infrastructure. This section of the 
chapter will provide you with a high-level overview and enough practical informa- 
tion to get you started. You can begin the process with a hardware inventory and 
some decision making regarding software and operating systems. 


Because there are so many types of printers and combinations of devices, operating 
systems, and software out there, you’ll have to do most of your printing-related 
learning on the job. The best approach to learning about printing involves develop- 
ing a strategy for your own infrastructure. That narrows down the amount of infor- 
mation you'll need to digest. 


Printing Software Considerations 


Linux and Windows started off with completely different printing models. Fortu- 
nately, progress has been made in getting everyone to cooperate and play nicely. But 
until you configure the printers in your network, they’re still not likely to work 
together. 


Originally, Linux used the Unix standard for printing known as Line Printer Dae- 
mon (LPD); later, an upgraded daemon called LPRng was added. Linux distribu- 
tions also used the LPD tools for printing and interoperability with Unix variants. 
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Linux distributors continue to ship LPDs and their tools, but they’ve also added sup- 
port for a new system known as the Common Unix Printing System (CUPS). Unlike 
LPD, CUPS is also compatible with the Windows and Mac OSs. CUPS and LPD use 
different network printing protocols. Whereas LDP cannot query a print job for basic 
characteristics, CUPS can. CUPS also works directly in heterogeneous networks and 
can couple with Samba if necessary. Not all Linux distributions enable the interface, 
but Red Hat includes CUPS in Fedora by default. 


As a system administrator, you will want to familiarize yourself with CUPS adminis- 
trative tools. In Fedora, simply type http://localhost:631 in a browser, and you will 
see the management interface presented in Figure 8-12. 





Home - CUPS 1.2.2 


Administration Documentation/Help ob Printers 


Welcome! 


These web pages allow you to monitor your printers and jobs as well as perform system administration tasks, Click 
on any of the tabs above or on the buttons below to perform a task, 


œ CED Curo are Qs Cra 
Administrative commands are disabled in the web interface for security reasons. Please use the GNOME CUPS 


manager (System > Administration > Printing). /usr/share/doc/cupsys/README. Debian. gz describes the detalls and how 
to reenable it again. 


About CUPS 


systems. It Is developed and maintained by Easy Software Products 
to promote a standard printing solution. CUPS is the standard printing 


CUPS provides a portable printing layer for UNIX®-based operating E S P 
Oo 
system used on MacOS® X and most Linux® distributions. Senne 


CUPS uses the Internet Printing Protocol ("IPP") as the basis for managing print 
jobs and queues and adds network printer browsing and PostScript Printer Description 
PPD") cLoptions to. i 














Figure 8-12. The CUPS configuration interface 


The interface is self-explanatory, so we’ll leave its exploration up to you. If you lack 
familiarity with CUPS, take a look at the management interface or go to the project 
web site at http://www.cups.org/book/index.php and read the book. 


Cross-Platform Printing 


Now, let’s consider some of the printing dilemmas you’re likely to face in today’s 
enterprise environments. You'll almost certainly find situations where you want to 
share Linux printers with Windows machines. (In fact, you’ll probably want to use 
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Linux as a print server in a Windows network to save license fees.) You may also 
want to share Windows printers with Linux machines. How do you do that? 


First, let’s look at giving Windows users access to Linux-connected printers. Typi- 
cally, you’ll need to set up a Samba workgroup or domain, and you will need to 
install CUPS on your Linux PCs. You’ll also need to configure CUPS for Samba, 
which you can do with the following command: 


# In -s “which smbspool” /usr/lib/cups/backend/smb 


Edit /etc/samba/smb.conf to create a printer share on a Samba server. In a real-life sit- 
uation you're likely to restrict access to certain systems or users for each printer, but 
in the following example the Linux PC will share all of its printers with any systems 
on your network that you’ve configured Samba to serve: 
[printers] 

comment = All Printers 

printing = cups 

printcap name = cups 
Your Windows PCs can now access printers over the network. You will probably 
need the Windows print drivers, either from your Windows version’s media or the 
media that came with your printer. 


In the next scenario, you need to enable your Linux users to use printers connected 
to Windows servers. Again, you need CUPS and Samba to do this. On the Windows 
PCs, share the printers as you normally would: under Windows NT, 2000, and/or 
XP, enable the guest account and provide permissions for everyone to access the 
shared printers. Then install CUPS on the Samba server and configure it for Samba as 
described earlier. 


Now install the Windows printers you want to make available on the Samba server 
with CUPS, using the CUPS web interface. 


You will need to log in as root. On some Linux systems, you need to set up root as 
the CUPS system admin first. You can do that with the adduser command: 


~$ su 

Password: 

# adduser cupsys shadow 

Adding user `cupsys' to group `shadow'... 

Done. 

# /etc/init.d/cupsys restart 

Restarting Common Unix Printing System: cupsd [ ok ] 
# 


Then you can log in as root. 


Click on “Add Printer” and enter the printer name from the Windows system. We’ll 
use “BrotherHL1440” (see Figure 8-13). Then enter the location and description. 
When you get to the device window, click on the drop-down menu and select “Win- 
dows Printer via SAMBA.” 
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Device for BrotherHL1440-2 





Device: [AppSocket/HP JetDirect 


AppSocket/HP JetDirect 
Backend Error Handler 
Bluetooth printer 

hp no_device_found 

Internet Printing Protocol (http) 
Internet Printing Protocol (ipp) 
LPD/LPR Host or Printer 

LPT #1 


Parallel Port #1 (CANON) 
Parallel Port #1 (EPSON) 
Windows Printer via SAMBA 














Figure 8-13. Adding Windows printers 


In the next window, “Device URI for,” enter the device URI. “BrotherHL1440-2” is 
connected to Philadelphia on Windows 2003, so you must enter the “guest” user- 
name and hostname: 


smb: //guest@philadelphia/brotherhl1440-2 
At this point, you have to select the printer driver. You should also print a test page. 


On your Linux client, open the CUPS interface, and you should see the printer. 
Linux clients on the LAN can now use this printer. 


Controlling Print Queues from the Command Line 


You can ssh to a remote Linux print server and use CUPS commands to control print 
queues. CUPS CLI commands usually require root privileges. 


Let’s take a brief look at those commands: 


Ipc 
Allows various forms of control over printers. With Ipc status, you can see a list 
of available queues and the status of each. 


Ipstat 
Displays a list of jobs queued for printing on the system’s printers. You can use 
various options to modify this command’s output. 

[pq 
Displays the status of the current queue or the queue specified with the -P queue 
option. 
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Ippasswd 
Changes the CUPS password used by the system. Set AuthType to Digest in the 
cupsd.conf configuration file. 


enable and disable 
Starts or stops the specified queue. The most frequently used command is dis- 
able with the -c option, to stop a queue and cancel all the jobs currently in the 
queue. 


accept and reject 
Causes the print queue to begin accepting or rejecting new jobs. 

Iprm 
Removes a job from the queue. You can specify the queue (-P queue) and the job 
identifier (obtained with Ipstat). 


Ipmove 
Moves a print job from one queue to another with a job identifier and a queue 
name (e.g., lpmove queue1-46 queue2). 


You can try these commands on your own. Here’s an example of the first one on the 
printer we just set up using the CUPS interface: 
# lpc status 
BrotherHL1440: 
printer is on device 'parallel' speed -1 
queuing is enabled 
printing is enabled 
no entries 
daemon present 


User Management 


In Linux, you can manage users (add, change, delete) in many ways. In the begin- 
ning of this section, we’re going to assume that each server you administer has its 
own database of users, found in the /etc/passwd file. We’re also going to assume that 
you know the basics of adding and deleting user accounts with the commands 
adduser and useradd for whatever distribution you use, since they differ from distro 
to distro. 


Different Linux distributions have changed the default behavior of the adduser/user- 
add commands. You can access manual pages for either command, but they proba- 
bly won’t work as the manpages indicate. You’ll have to experiment to see how your 
distribution behaves. In Fedora, the two commands seem to behave the same: they 
both add an account and a user directory. If you type either adduser tadelste or use- 
radd tadelste, the commands will add the user and create a home directory, but they 
won't ask for a temporary password or go through the standard Linux questions you 
might expect to see. 
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On other distributions, you might see output like this: 


... # adduser tadelste 
Adding user ~tadelste’... 
Adding new group `tadelste' (1001). 
Adding new user ~tadelste' (1001) with group ~tadelste'. 
Creating home directory ~/home/tadelste'. 
Copying files from ~/etc/skel' 
Enter new UNIX password: passwd1 
Retype new UNIX password: passwd1 
passwd: password updated successfully 
Changing the user information for tadelste 
Enter the new value, or press ENTER for the default 
Full Name []: New User 
Room Number []: 
Work Phone []: 999-555-1212 
Home Phone []: 
Other []: 
Is the information correct? [y/N] y 


On Fedora, however, the output stops at the “Copying files...” line. The administra- 
tor is then expected to create the first password for the user. But what if the adminis- 
trator doesn’t immediately assign the new user a password? Could the added user 
access the server through ssh, for instance? Let’s try it: 

$ ssh tadelste@host2.centralsoft.org 

tadelste@host2.centralsoft.org's password: 

Permission denied, please try again. 

tadelste@host2.centralsoft.org's password: 

Permission denied, please try again. 

tadelste@host2.centralsoft.org's password: 

Permission denied (publickey,gssapi-with-mic, password). 

$ 
As you can see, the answer is no. The user doesn’t just have a blank password; he 
doesn’t have a password at all. The ssh_config file has the password requirement 
enabled, so the user can’t use SSH to log in either. 


The root user must therefore add a password for the user, which an administrator 
can do as follows: 


[root@host2 ~]# passwd tadelste 

Changing password for user tadelste. 

New UNIX password: passwd1 

Retype new UNIX password: passwd1 

passwd: all authentication tokens updated successfully. 
[root@host2 ~]# 


The output states that the passwd command is changing the password for the user, 
but it’s not; it does not ask for the (nonexistent) original password. 


As a user, once you’ve been assigned a password, you can change it yourself: 


$ passwd 
Changing password for user tadelste. 
Changing password for tadelste 
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(current) UNIX password: passwd1 

New UNIX password: passwd1 

Password unchanged 

New UNIX password: passwd2 

Retype new UNIX password: passwd2 

passwd: all authentication tokens updated successfully. 

$ 
Fedora first verifies that you have a password (if you don’t, you won’t be able to log 
onto the server). It also verifies that the new password you enter is different from 
your existing password. If you enter the same password, Fedora does not accept it 


and prompts you again. 


Since Fedora uses Red Hat’s protocol, you have to assume that some security issues 
must exist around the adding of users and setting of passwords. 


When you installed Fedora, the installation script prompted you to create a pass- 
word for the root account and an optional primary user account besides root. Other 
than that, you may have only scant experience with adding users, and little if any 
with group administration. 


System administrators need to know: 


e How to create and set up accounts 
e How to delete or disable accounts 


e The potential for security exploits associated with user management, and how to 
remedy them 


You should also be aware that user accounts serve a number of purposes on Linux 
systems, and that some “users” are not people. You'll see two major types of 
accounts: 


Accounts for real people 
Each user is given an account that is associated with a few configuration options, 
such as a password, a home directory, and a shell that runs when the user logs 
in. Providing separate accounts for each user allows people to set permissions on 
their files, so they can control who has access to them. 


Accounts for system services such as mail or a database server 

These accounts ensure that services run with very restricted privileges and have 
access only to a few necessary files, in case a programming error or a malicious 
intruder causes them to try to affect other parts of the system. Typically, when a 
service is installed, the installation process or the system administrator creates a 
user and group of the same name (postfix, mysql, etc.) and assigns them to all 
files and directories controlled by the service. Services are not given passwords, 
home directories, or shells, because only intruders would be likely to use these. 


As stated previously, if you’re reading this book, you should already know how to 
add users, set passwords, and so on. Now, we want to focus on the issues an admin- 
istrator needs to know about users from a security point of view. 
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Removing a User 


Employee turnover in many organizations runs high. So, unless you run a small shop 
with a stable user base, you need to learn how to clean up after an employee leaves. 
Too many so-called system administrators do not understand the stakes involved 
when they manage users. Disgruntled former employees can often cause significant 
trouble for a company by gaining access to the network. 


Removing a user isn’t a one-step process—you need to manage all of the user’s files, 
mailboxes, mail aliases, print jobs, recurring (automatic) personal processes (such as 
the backing up of data or remote syncing of directories), and other references to the 
user. It is a good idea to first disable the user’s account in /etc/passwd; after that, you 
can search for the user’s files and other references. Once all traces of the user have been 
cleaned up, you can remove the user completely (if you remove the entry from /etc/ 
passwd while these other references exist, you have a harder time specifying them). 


When you remove a user, it’s a good idea to follow a predetermined course of action 
so you don’t forget any important steps; you may even want to make a checklist so 
that you have a routine laid out. 


The first task is to disable the user’s password, effectively locking him out. You can 
do this with a command like the following: 


# passwd -1 tadelste 


Sometimes it’s necessary to temporarily disable an account without removing it. For 
example, a user might go on maternity leave or take a post for 90 days in another 
country. You may also discover from your system logs that someone has gained 
unauthorized control of an account by guessing its password. The passwd -l com- 
mand is useful for these situations as well. 


Next, you have to decide what to do with the user’s files. Remember that users may 
have files outside their home directories. The find command can find them: 

# find / -user tadelste 

[root@host2 ~]# find / -user tadelste 

/home/tadelste 

/home/tadelste/.zshrc 

/home/tadelste/.bashrc 

/home/tadelste/.bash_profile 

/home/tadelste/.gtkrc 

/home/tadelste/.bash_logout....... 


You can then decide whether to delete these files or keep them. If you decide to 
delete them, back them up in case you need data from them later. 


As extra security, you can change the user’s login shell to a dummy value. Simply 
change the last field in the passwd file to /bin/false. 


If your organization uses Secure Shell (SSH, usually provided on Linux by OpenSSH- 
server) and you allow remote RSA or DSA key authentication, a user can get access to 
your system even if his password is disabled. This is because SSH uses separate keys. 





User Management | 189 


For instance, even after you have disabled Tom Adelstein’s password, he can get on 
another computer somewhere and run a command such as: 


$ ssh -f -N -L8000: intranet. yourcompany.com:80 my.domain.com 


This forwards traffic to port 80 (the port on which a web server usually listens) on 
your internal server. 


Obviously, if your system offers SSH, you should remove authorized keys from the 
appropriate directories (e.g., ~tadelste/.ssh or .~tadelste/.ssh2) in order to stop the 
user from regaining access to his account this way: 

$ cd .ssh 

:~/.ssh$ 1s 

authorized_keys known_hosts 

:~/.ssh$ rm authorized_keys 

:~/.ssh$ 1s 

known_hosts 

:~/.ssh$ 
Likewise, look for .shosts and .rhosts files in the user’s home directory (for example, 
~tadelste/.shosts and ~tadelste/.rhosts). 


Also, check to see if the user still has any processes running on the system. Such pro- 
cesses might act as a backdoor to allow the user into your network. The following 
command will tell you if a user currently has any running processes: 


# ps aux |grep -i ‘tadelste 


Some other questions a system administrator might ask about a personal user who 
has left the company include: 


e Could the user execute CGI scripts from his home directory or on one of the 
company’s web servers? 


e Do any email forwarding files such as ~tadelste/.forward exist? Users can use for- 
warders to send mail to their accounts and cause programs to be executed on the 
system where they supposedly do not have access. 


Sealing the Home Directory 


You will often find that management wants to retain the information in the home 
directory of an employee who leaves. All the email and other documents in a per- 
sonal user’s account belong to the company. In the event that a disgruntled former 
employee becomes litigious, the company’s legal counsel may want access to these 
files. Many analysts consider the keeping such directories good practice. 


You can save the contents of a user’s home directory by renaming it. Simply execute 
a move command: 


# mv /home/tadelste /home/tadelste. locked 
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This prevents the former employee from logging in or making use of configuration 
files such as the .forward file discussed in the previous section. The contents remain 
intact in case they’re needed later. 


Graphical User Managers 


As Linux’s market penetration began to increase earlier in the decade, companies 
such as Sun Microsystems, Novell, Computer Associates, HP, and IBM started port- 
ing their administrative toolkits to Red Hat, SUSE, and other Linux platforms. Addi- 
tionally, the administrative tools bundled with Linux distributions began to mature, 
with increases in both function and usability. 


Since you now have some knowledge of the commands and processes required to 
create and clean up a personal user account, you should find these utilities easy to 
use. Generally, though, you will find them less flexible than using the command line. 


Let’s take a look at an example of one such tool, originally built on a SUSE utility called 
YaST2. Sun’s Java Desktop Configurator is pictured in Figure 8-14. Descriptions of the 
functions you can perform with this tool are provided in the panel on the left. 
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Figure 8-14. Sun Microsystems’s JDS User Manager 
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Notice that the dialog box at the top is asking whether you want to delete the direc- 
tory /home/tadelste. As we discussed previously, your company may wish to retain 
the home directories of former employees. In this case, the graphical tool gives you 
only two options: either to delete the directory or not. It does not give you the option 
of renaming the directory, which, as we discussed earlier, may be the most secure 
and convenient course to take. 


In Figure 8-15, you can see another example taken from our Fedora system. 
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Figure 8-15. Fedora User Manager graphical user management tool 


With the Fedora graphical user management tool, you can perform the same basic 
functions as the ones outlined in Figure 8-14. Again, it may not provide all of the 
options you need to properly manage the accounts of departing users. 
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Although it’s not technically a user manager, Fedora offers another tool that you can 
use to configure a number of services related to users. Take a look at Figure 8-16, the 
graphical tool provided by Fedora when you type the text command setup. 
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Figure 8-16. Red Hat Authentication Configurator 


This is another example of the many ways Linux provides to manage user accounts. 
It does not require you to run the X Window System. 
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CHAPTER 9 


Virtualization in the 
Modern Enterprise 














In this chapter, we address an area experiencing explosive growth in demand for 
Linux system administrators. Linux virtualization lies at the heart of today’s trends in 
data center consolidation, high-performance computing, rapid provisioning, busi- 
ness continuity, and workload management. Enterprises are seeing real cost savings 
because of Linux virtualization, and analysts are noting that the technology is chang- 
ing the business landscape. 


Virtualization is a concept that has gained popularity thanks to the successful com- 
pany VMware (hitp://www.vmware.com) and the open source project Xen (http:// 
www.cl.cam.ac.uk/research/srg/netos/xen). It refers to one piece of hardware running 
multiple kernels (which are sometimes all the same and sometimes from completely 
different operating systems) on top of a lower layer of software that manages their 
access to the hardware. Each kernel, called a guest, acts as if it has the whole proces- 
sor to itself. 


The different guests are isolated from each other much more than processes are iso- 
lated within a single operating system. This isolation provides security and robust- 
ness, because a failure or compromise in one guest doesn’t affect the others. The 
virtualization layer performs many functions of an operating system, managing 
access to processor time, devices, and memory for each guest. 


At the time of this writing, the Linux developers are working on a new system called 
the Kernal-based Virtual Machine (KVM), which will be part of the kernal. 


Why Virtualization Is Popular 


To understand who is using virtualization and the environments in which it’s valu- 
able, you should understand a bit about current business needs. This section pro- 
vides that background before we explain how Linux virtualization works. 


The entire field of information technology has grown exponentially since the advent 
of common distributed filesystems. Organizations have seen their infrastructures 
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expand year by year. Many attribute this growth to constant improvements in com- 
puter components and software. But that’s not the whole picture. 


Computer technology has evolved from a focus on managing transactions to harness- 
ing business processes. Some firms specialize in human resource management, oth- 
ers in finance and accounting, and still others in manufacturing and supply chain 
management. This specialization has created fiefdoms in data centers and among IT 
staffs. 


Traditional networks are now able to capture and manage more and different kinds 
of transactions than ever before, and this has created the need for increased comput- 
ing power and subsequently more storage. Growth has also occurred in the number 
of places and ways we store data, which in itself has created server sprawl (see 
Figure 9-1). 
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Figure 9-1. Sprawling server farms, one operating system per box 


Now add another piece to the mix: specialized applications for fields such as 
accounting and finance nearly always run on separate, highly available servers with 
redundant hardware for the sake of ensuring business continuity. This combination 
of factors has transformed the IT landscape into a welter of isolated, single-function, 
oversized and underutilized physical servers. 


On top of all this comes the increasing burden of regulatory compliance, which 
causes costs to grow again: you have to increase your capacity to store and retrieve 
documents, and in many cases you’re expected to store them for up to 25 years. 


Consider what that means. Your successors won’t necessarily have the technology 
available to produce the documents an auditor or attorney might want a decade from 
now, much less in a quarter of a century. 


Let’s take another look at the results of computer growth. We have: 


e Single-function servers and applications (often known as “silos”) with underused 
capacity 

e Additional cost increases because of the complexity of software and the need to 
manage ever-increasing amounts of data 
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e The need for staff to specialize into functional areas where you will find a lack of 
documentation and high levels of personnel turnover 


e The need to train and support users and administrators and keep software up-to- 
date 


Now you might understand why enterprise virtualization has gained popularity and 
become one of the few areas where technology can change the business landscape. 
With virtual images, you can easily compress your data together with all the pro- 
grams, configuration settings, operating system libraries, and other metadata that 
make a whole system. Restoring an image restores the system exactly as it was run- 
ning at the time, thus making it easier to reproduce documents. Virtualization has 
the following benefits: 


e It replaces wasteful arrays of systems with fewer, better-utilized systems. 


e It simplifies administration, because separate kernels with one application run- 
ning on each are more secure and manageable than one kernel running many 
applications. It also maintains the environment in which documents were cre- 
ated, to meet regulatory requirements. 


e Reduced hardware and complexity allows reduced staff. 


e Virtualization may help reverse the trend of server sprawl. 


High-Performance Computing 


Linux has become the preferred host operating system for virtual machines because 
of its ability to run and manage massive PC clusters and grids. It took a while for the 
major hardware vendors to catch on, but once they did they saw big dollar signs. For 
several years Linux has enjoyed benefactors willing to contribute personnel and 
advanced technology to its development effort. Such contributors include IBM, Intel, 
AMD, HP, Novell, Red Hat, Unisys, Fujitsu, and dozens of others. 


For example, IBM needed a utility operating system for its OpenPower initiative. 
Suddenly, Linux ran on Big Blue’s Virtualization Engine in the form of an open 
source hypervisor and accompanying technologies. IBM’s engine allows Linux to cre- 
ate and manage partitions and dynamically allocate I/O resources to them. 


Then Linux kernel developers announced their new simultaneous multi-threading 
(SMT) and hyper-threading technology. Linux can now enable two threads to exe- 
cute simultaneously on the same processor—an essential technology to act as a host 
for guest operating systems. Thus, VMware runs well on top of Linux and provides a 
virtualization layer for other instances of Linux or other operating systems. 
User-Mode Linux (UML) is another example of Linux forming a foundation 
for virtualization. 


The 2.6 Linux kernel fits well with IBM’s SMT technology. Prior to this version of 
the kernel, Linux had insufficient thread-scheduling and arbitration-response 
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characteristics. The 2.6 kernel fixed that problem and greatly expanded the number 
of processors on which the kernel could run. 


This is important for two reasons. First, as a host for virtual machines, Linux has to 
perform well and excel at managing its hardware. Second, as a guest divorced from 
its physical hardware, it has to maintain its performance and capacity to handle vari- 
ous processes as the host. Today, Linux makes both a great host and a great guest 
OS. It manages hardware and virtual partitioning and runs well in the guest parti- 
tions, thanks to HP and IBM. 


If you’ve ever wondered why companies like XenSource and Virtual Iron suddenly 
appeared out of thin air, now you know: it’s because of open source hypervisor con- 
tributions. Like the hardware vendors that realized Linux could enhance PC and data 
center component sales, software vendors jumped on the bandwagon. Even 
Microsoft eventually realized it needed to get in on the Linux game, contributing to 
both XenSource and Virtual Iron. 


Business Continuity and Workload Management 


Even on a small scale, your organization will benefit from separating email, DNS, 
and web servers and directories, gateways, and databases. Placing each of these ser- 
vices on a unique server ensures that if one server goes down, your entire infrastruc- 
ture doesn’t collapse. But separating your services on physical hardware requires a 
lot of time, space, money, and overhead. You also need to back up and restore your 
data, provide for catastrophes, and deploy the best hardware for the job. 


With Linux virtualization, you can partition a single physical server into a group of 
virtual ones. Each virtual server appears like a physical one to system administrators. 
You can create a separate server instance for each service you want to provide: email, 
DNS, web serving, and so on. If one fails, you won’t mangle the others. 


Partitioning the physical host also enables you to create a different configuration for 
each virtual server on the same physical hardware. In one environment, for example, 
we created smaller virtual machines (VMs) for our DNS servers and larger ones for 
email and web serving. This allowed us to spread the workload and maintain the 
same physical hardware. Figure 9-2 gives a sense of what you can accomplish with a 
single physical server. 


Rapid Provisioning 


We first accomplished virtualization on our network by creating a minimal installa- 
tion of Debian in a VM. Once we got it tuned to our needs, we compressed it and 
put it on CD-R media. We then set up our additional virtual machines using 
VMware with different configurations, and copied the compressed image into each 
directory we specified for a VM. 
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Figure 9-2. Partitioning a single physical server into multiple virtual machines 


We 
Each VM lives in a directory. For example, our main directory, /var/ 
43 lib/vmware/Virtual Machines, contains several subdirectories such as 
Vs 43° debian-311r0a-i386-netinst-kernel2.6. We simply compressed that sub- 
` directory and used it for deployment to other subdirectories with 
slightly different names. 








We also set up Xen virtual machines using Fedora minimal installations. We then 
added the components we needed for each service we wanted to provide. For exam- 
ple, our primary DNS server runs in a Xen virtual machine, while our web and mail 
servers run in separate instances of VMware. 


After we got a server (say, email) running, we made a compressed copy of it and 
burned it to a CD-R. We regularly and systematically back up each virtual server 
onto visual media such as CDs and DVDs. We also tried moving the images to differ- 
ent distributions of Linux, and they ran just as they had previously. 


How Virtualization Helps 


What did we accomplish with virtualization? First, we eliminated several physical 
servers. We deployed our preferred operating system as an image, so we needed to 
go through the installation process only once. We then created virtual machines on 
spare hardware and systematically copied our virtual images to allow for instant 
recovery in case of a system failure. 


Virtualization works well for small companies, allowing them to build an infrastruc- 
ture with free software. Imagine the cost savings just from licensing fees! Now, imag- 
ine what kinds of strategies large companies can implement using Linux. 
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By now, you may be anxious to see how all of this works. So, let’s go through the 
process of installing and configuring Xen and VMware and demonstrate how to vir- 
tualize a server network. 


Installing Xen on Fedora 5 


In this section of the chapter, we’ll show you how to install Xen on a single machine 
to manage two operating systems. As Xen makes its way into the standard Linux dis- 
tributions, installation will become smoother. But for now, some manual labor is 
needed. 


We’re using Fedora Core 5 (FC5) as the Xen host operating system, since it supports 
Xen 3.0 out of the box. Let’s ask yum (a package manager similar to Debian’s apt-get 
or Red Hat’s up2date) about Xen: 

# yum info xen 

Loading “installonlyn" plugin 

Setting up repositories 


core [1/3] 
updates [2/3] 
extras [3/3] 


Reading repository metadata in from local files 
Available Packages 

Name : xen 

Arch : 1386 

Version: 3.0.2 

Release: 3.FC5 

Size : 1.4M 

Repo : updates 

Summary: Xen is a virtual machine monitor 
Description: 

This package contains the Xen hypervisor and Xen tools, needed to 
run virtual machines on x86 systems, together with the kernel-xen* 
packages. Information on how to use Xen can be found at the Xen 
project pages. 


Virtualisation can be used to run multiple versions or multiple 
Linux distributions on one system, or to test untrusted applications 
in a sandboxed environment. Note that the Xen technology is still 
in development, and this RPM has received extremely little testing. 
Don't be surprised if this RPM eats your data, drinks your coffee 
or makes fun of you in front of your friends. 


That sounds encouraging. Let’s try it, but first check some requirements: 
e The system must have at least 256 MB of RAM. 
e grub must be your boot loader. 


e SELINUX must be disabled or permissive, but not enforcing. 
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Run the system-config-securitylevel program or edit /etc/selinux/config to looks as 
follows: 


This file controls the state of SELinux on the system. 

SELINUX= can take one of these three values: 
enforcing - SELinux security policy is enforced. 
permissive - SELinux prints warnings instead of enforcing. 
disabled - SELinux is fully disabled. 

SELINUX=Disabled 

SELINUXTYPE= type of policy in use. Possible values are: 
targeted - Only targeted network daemons are protected. 
strict - Full SELinux protection. 

SELINUXTYPE=targeted 








If you changed the SELINUX value from enforcing, you'll need to reboot Fedora before 
proceeding. 


This command will install the Xen hypervisor, a Xen-modified Fedora kernel called 
domain 0, and various utilities: 


# yum install kernel-xeno 


The need for a special Xen-modified Linux kernel may disappear in 
as the future as Intel and AMD introduce virtualization support in their 
2 chips. Windows Vista is also expected to support virtualization at the 
` processor level. 








This adds xen0 as the first kernel choice in /boot/grub/grub.conf, but not the default: 


grub.conf generated by anaconda 


Note that you do not have to rerun grub after making changes to this file 
NOTICE: You have a /boot partition. This means that 
all kernel and initrd paths are relative to /boot/, eg. 
root (hdo,0o) 
kernel /vmlinuz-version ro root=/dev/VolGroup00/LogVol00 
initrd /initrd-version. img 
boot=/dev/hda 
default=1 
timeout=5 
splashimage=(hdo,0)/grub/splash.xpm.gz 
hiddenmenu 
title Fedora Core (2.6.17-1.2157_FC5xen0) 
root (hdo,0) 
kernel /xen.gz-2.6.17-1.2157_FC5 
module /vmlinuz-2.6.17-1.2157_FC5xenO ro root=/dev/VolGroup00/LogVol00 
module /initrd-2.6.17-1.2157_FC5xeno. img 
title Fedora Core (2.6.17-1.2157_FC5) 
root (hdo,0o) 
kernel /vmlinuz-2.6.17-1.2157_FC5 ro root=/dev/VolGroup00/LogVol00 
initrd /initrd-2.6.17-1.2157_FC5.img 
title Fedora Core (2.6.15-1.2054 FC5) 
root (hdo,0o) 
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kernel /vmlinuz-2.6.15-1.2054 FC5 ro root=/dev/VolGroup00/LogVol00 
initrd /initrd-2.6.15-1.2054 FCS. img 
default=o 


To make the Xen kernel the default, change this line: 
default=1 
to: 
default=o 
Now you can reboot. Xen should start automatically, but let’s check: 


# /usr/sbin/xm list 
Name ID Mem(MiB) VCPUs State Time(s) 
Domain-0 0 880 1 r----- 20.5 


The output should show that Domain-0 is running. Domain 0 controls all the guest 


operating systems that run on the processor, similarly to how the kernel controls 
processes in an operating system. 


Installing a Xen Guest 0S 


Xen is now in control of the processor, but you need to add at least one guest operat- 
ing system. We’ll start with installing a Fedora Core 5 guest, because it facilitates the 
job, and then we’ll offer some tips for other variants of Linux. 


Fedora Core 5 


Fedora Core 5 has a Xen guest installation script that simplifies the process, although 
it installs only FC5 guests. The script expects to access the FC5 install tree via FTP, 
the Web, or NFS; for some reason, you can’t specify a directory or file. We’ll use our 
FC5 installation DVD and serve it with Apache: 


mkdir /var/www/html/dvd 
mount -t iso9660 /dev/dvd /var/www/html/dvd 
apachectl start 


Now we'll run the installation script and answer its questions: 


xenguest-install.py 

What is the name of your virtual machine? guest1 

How much RAM should be allocated (in megabytes)? 256 
What would you like to use as the disk (path)? /xenguest 
What is the install location? http://127.0.0.1/dvd 





At this point, the FC5 installation begins. Choose between text mode and graphic 
mode (if X is running) via vnc. If you choose text mode, you'll be connected to a con- 
sole. Proceed as you normally would for a Fedora or Red Hat installation. On the IP 
address screen, give the guest a different address from the host, or use DHCP (if you 
said dhcp="dhcp" in the Xen configuration file, which is explained in the next sec- 
tion). The last screen will ask you to reboot. Unmount the DVD and eject it. You will 
be rebooting only your new guest system, not Xen or the host. 
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Xen does not start the guest operating system automatically. You need to type this 
command on the host: 


# xm create guest1 


At this point, you'll have two operating systems (host1 and guest1) operating inde- 
pendently and living in harmony, each with its own filesystems, network connec- 
tions, and memory. To prove that both servers are running, try these commands: 


xm list 

ame ID Mem(MiB) VCPUs State Time(s) 
Domain-0 0 128 1 Y----- 686.0 
guest1 3 256 1 -b---- 14.5 

xentop 


xentop - 21:04:38 Xen 3.0-unstable 
2 domains: 1 running, 1 blocked, O paused, O crashed, O dying, O shutdown 
lem: 982332k total, 414900k used, 567432k free CPUs: 1 @ 2532MHz 
NAME STATE CPU(sec) CPU(%) MEM(k) MEM(%) MAXMEM(k) MAXMEM(%) VCPUS NETS 
ETTX(k) NETRX(k) SSID 





Domain-O ----- r 686 0.3 131144 13.4 no limit n/a 1 8 
1488528 80298 0 
guest1 --b--- 14 0.1 261996 26.7 262144 26.7 i a: 
129 131 0 


To start Xen domains automatically, use these commands: 


# /sbin/chkconfig --level 345 xendomains on 
# /sbin/service xendomains start 


Other guests 


If you want a guest OS other than FC5, you'll need to edit a Xen guest configuration 
file, which is a text file (actually, a Python script) in the /etc/xen directory. 
xmexample1 and xmexample2 are commented sample files. For the full file syntax, 
see: 


# man xmdomain.cfg 


When we ran xenguest-install.py in the previous section, it generated the Xen guest 
configuration /etc/xen/guest1, with a few extra lines: 


# Automatically generated Xen config file 
name = "guest1" 

memory = "256" 

disk = [ 'file:/xenguest,xvda,w' ] 

vif = [ 'mac=00:16:3e:63:c7:76' | 

uuid = "bc2c1684-c057-99ea-962b-de44a038bbda" 
boot loader="/usr/bin/pygrub" 


on_reboot = ‘restart’ 
on_crash = 'restart' 


This contains some, but not all, of the directives a guest needs. A minimal guest con- 
figuration file looks something like this: 
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1. A unique guest domain name: 
name="vm01" 

2. A Xen-enabled kernel image pathname for the guest domain: 
kernel="/boot/vmlinuz-2.6.12.6-xenU" 

3. A root device for the guest domain: 
root="/dev/hda1" 

4. Initial memory allocation for the guest, in megabytes: 


memory=128 


The sum of the memory for all Xen guests must not exceed physical 
memory minus 64 MB for Xen itself. 








5. The disk space for the guest domain. This is defined in one or more disk block 
device stanzas, each enclosed in single or double quotes: 
disk = [ 'stanza1', ‘stanza2' | 
A stanza consists of a string of three parameters (‘host_dev, guest_dev, mode '). 
host_dev is the domain’s storage area as seen by the host. This may be one of: 


file: pathname 
A loopback file image (a single local file that Xen treats as a filesystem); this 
is created when you run xm create or the xen-create-image program. 
phy: device 
A physical device. 
guest_dev is the physical device as seen by the guest domain, and mode is r for 
read-only or w for read-write. Thus, a sample disk directive for two guests is: 


disk=['file:/vserver/images/vm01.img, hda1, w', 'file:/vserver/images/vm01-swap. 
img, hda2, w' ] 
6. Network interface information in a vif directive. This directive may contain a 
stanza for each network device. The default network is specified with: 
vife[ "' ] 
A dhcp directive controls whether DHCP is used or the interface information is 
hard-coded. The following specifies the use of DHCP: 
dhcp="dhcp" 
If the dhcp directive is missing or set to "off", you must specify network informa- 
tion statically, as you do when configuring a system: 


ip="192.168.0.101" 
netmask="255.255.255.0" 
gateway="192.168.0.1" 
hostname="vm01. example.com" 
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The xm manpage gives the following example of a minimal guest, with a loopback 
file image on the host appearing as the root device on the guest: 

kernel = "/boot/vmlinuz-2.6-xenU" 

memory = 128 

name = "MyLinux" 

root = "/dev/hda1 ro" 

disk = [ "file:/var/xen/mylinux.img,hda1,w" ] 


Once you have a guest configuration file, create the Xen guest with this command: 
# xm create -c guest_name 


where guest_name can be a full pathname or a relative filename (in which case Xen 
places it in /etc/xen/guest_name). Xen will create the guest domain and try to boot it 
from the given file or device. The -c option attaches a console to the domain when it 
starts, so you can answer the installation questions that appear. 


Installing VMware 


VMware has made its server available for free, and the code is even open source. You 
can find it at http://www.vmware.com/products/server. We found it robust and user- 
friendly. You can read about VMware’s open source and community source initia- 
tives on its web site. 


As we mentioned earlier, startups such as XenSource and Virtual Iron have taken 
advantage of the Linux kernel’s support of hypervisor technology from IBM. Under 
competitive pressure from Xen, VMware has also submitted its own open source 
contributions to the kernel developers, realizing that VMware will run better on 
Linux if VMware gives the Linux kernel a little help. 


While we ran Xen using Fedora Core 5, we decided to install VMware on an Ubuntu 
server as our host and used Debian as our guest operating system. We also managed 
remote VMware instances from an Ubuntu desktop using the VMware console. 
Later, we installed FC5 under a VMware virtual machine. 


We downloaded Vmware-server-1.0.1-29996.tar.gz and decompressed it to an instal- 
lation directory called vmware-server-distrib. Inside the directory we found vmware- 
install.pl and ran it with the command ./vmware-install.pl. Soon afterward, the instal- 
lation program began and displayed the following messages: 


Creating a new installer database using the tar3 format. 
Installing the content of the package. 


In which directory do you want to install the binary files? 
[/usr/bin] 


VMware Server’s installation begins with several questions like this, based on the 
installation script’s sniffing of your operating system and file layouts. 
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During the installation process, the script asks you to accept VMware’s product 
license. You should read it before accepting it. After you agree to the license, 
VMware verifies that the compiler and header files on your system are compatible 
with each other and builds the VMware binaries using your compiler. You will see 
messages such as: 

The path "/usr/lib/vmware" does not exist currently. This program is going 


to create it, including needed parent directories. Is this what you want? 
[yes] 


Additionally, you will see code compilations like the following example: 




















make[1]: Entering directory '/usr/src/linux-headers-2.6.15-26-k7' 
cc /tmp/vmware-configO/vmnet-only/driver.o 
cc /tmp/vmware-configO/vmnet-only/hub.o 
cc /tmp/vmware-configO/vmnet-only/userif.o 
cc /tmp/vmware-configO/vmnet-only/netif.o 
cc /tmp/vmware-configO/vmnet-only/bridge.o 
cc /tmp/vmware-configO/vmnet-only/procfs.o 
cc /tmp/vmware-configO/vmnet-only/smac_compat.o 
SHIPPED /tmp/vmware-configO/vmnet-only/smac_linux.x386.0 
LD /tmp/vmware-configO/vmnet-only/vmnet.o 
Building modules, stage 2. 
MODPOST 


Toward the end of the installation, the script will inform you that installation of the 
code has completed and offer you a command you can use if you ever want to unin- 
stall the server: 


The installation of VMware Server 1.0.1 build-29996 for Linux completed 
successfully. You can decide to remove this software from your system at any 
time by invoking the following command: "/usr/bin/vmware-uninstall.pl". 


The installation script will also ask you to run the configuration command: 


Before running VMware Server for the first time, you need to configure it by 
invoking the following command: "/usr/bin/vmware-config.pl". Do you want 
this program to invoke the command for you now? [yes] 


As the installation process ends, you will see the following messages: 


Starting VMware services: 


Virtual machine monitor done 
Virtual Ethernet done 
Bridged networking on /dev/vmneto done 


Host-only networking on /dev/vmnet1 (background) done 
Host-only networking on /dev/vmnet8 (background) done 
NAT service on /dev/vmnet8 done 
Starting VMware virtual machines done 


The configuration of VMware Server 1.0.1 build-29996 for Linux for this 
running kernel completed successfully. 
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You can download an existing operating system image, which VMware calls an 
appliance, from http://www.vmware.com/vmtn/appliances/directory. We chose 
debian-3 1r0a-i386-netinst-kernel2.6.zip, which we placed under the /var/lib/vmare/ 
Virtual Machines directory and decompressed. 


Once we had our basic image, we started the VMware management console on a 
remote Ubuntu desktop behind a firewall at a remote location. We ran the command: 


$ gksu vmware-server-console 


We then configured the console to connect to our guest operating system remotely. 
With the VMware Server Console running, we connected to the remote virtual 
machine and logged on as root, as shown in Figure 9-3. 





Bs Connect to Host 
VMware Server Console 


m) Select the VMware host that you want to connect to. 


g in | To access virtual machines on a networked host, enter 
the host name and a valid user name and password. 





Host name: | 70.253.158.43 zl 





User name: ‘root 








Password: | @@@e@eeooeee| 








X Cancel | 7 T Connect 

















Figure 9-3. Connecting to a remote virtual host 


After we connected to the remote host, VMware prompted us to create a virtual 
machine. Because we’d already created one, we instead clicked on the File menu and 
opened the directory that contained our existing instance of Debian. This action 
added Debian to the VM inventory. Our console then appeared similar to Figure 9-4, 
which gave us an idea of the operating functions available. 


We were then able to start Debian. As the system booted, Debian began to run the 
later phases of its installation script. We let it run, and within a short time we got to 
the screen in Figure 9-5. 


We opted to configure Debian manually instead of choosing one of the predefined 
configurations. That allowed us to create a default Debian server to deploy in addi- 
tional instances of VMware Server. Figure 9-6 shows the running Debian system. 


The screenshot shows us running the command ifconfig. We tested this instance to 
make sure our virtual Ethernet cards were correctly bound to the IP addresses we set up. 
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Debian 3.1 
host2.centralsoft.org 


State: [Connected 
Version: VMware Server 1.0.0 


(create a new virtual machine | | The VMware Server Console lets you connect to virtual 
machines that run on VMware Server and GSX 3 systems. 
® Open a virtual machine Each virtual machine is equivalent to a physical server with 
storage, networking, memory and devices. The VMware 

3X Edit host settings Server Console gives you full control over virtual machines, 
including keyboard, video and mouse interactivity, 
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Figure 9-4. Connected to a remote host ready to start up 








Desktop environment 
Web server 
Print server 


DNS server 

File server 

Mail server 

SQL database 

manual package selection 











Figure 9-5. The Debian installation script running under a remote virtual machine 


Once we had our basic Debian image, we zipped it up and burned it to CD-R media. 
We then deployed that image on the other hosts, after we’d determined each guest 
system’s role and resource requirements. 


Figure 9-7 provides a summary of the Debian image. On the right side of the screen 
you can see the configuration of the host. We can alter the virtual server dynamically 
to add memory, disk space, Ethernet cards, processors, and various devices as the 
need arises and as we set up additional machines. 
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Figure 9-6. The installed instance of Debian on its remote host 
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Figure 9-7. Console summary of our basic Debian guest image 
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Installing a VMware Guest 0S 


For our final task, installing another operating system, we downloaded Fedora Core 
5 from VMware’s community site, moved it to the Virtual Machines directory, and 
decompressed it as we did with Debian. Next, we added it to our inventory through 
the File menu. Figure 9-8 shows a question about a unique identifier; you can keep 
the existing one. 





we fedora-fc5-i386 - VMware Server Console 


File Edit View Host VM Tabs Help 
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The location of this virtual machine's configuration file has changed 
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If the virtual machine has been copied, you should create a new 
unique identifier (UUID). If it has been moved, you should keep its 
old identifier. 

If you are not sure, create a new identifier. 


What do you want to do? 








Keep Always Create Always Keep X% Cancel | |” create 




















Figure 9-8. VMware asks about a virtual machine image’s unique identifier 


VMware’s management console noticed we added an image. In order to distinguish 
between possible multiple images, it prompted us for a unique identifier (UUID) in 
the dialog shown in Figure 9-8. Because we copied Fedora 5 and have all the files 
making up the image, it did not matter which option we chose from the dialog. 


When you open a new virtual machine, VMware gives you a chance to verify the vir- 
tual hardware configuration. Figure 9-9 gives you an idea of the virtual hardware 
inventory available for Fedora Core 5. 


In addition to downloading images and loading them into the management con- 
sole, you can install a Linux operating system from a standard Linux distribution’s 
CD-ROM. 
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@ Processors 1 























host2.centralsoft.org | 











Figure 9-9. VMware virtual hardware configuration for Fedora Core 5 


Virtualization: A Passing Fad? 


Many analysts say they will sit on the sidelines and wait to see whether Linux virtual- 
ization takes hold. As a system administrator, you might want to weigh the risks and 
rewards of mastering this technology. Virtualization is not the equivalent of IBM’s 
introduction of the PC or Microsoft’s introduction of distributed filesystems. The 
impact of hypervisor technology doesn’t even compare to that of ERP programs such 
as SAP, PeopleSoft, or Oracle Financials. 


In any case, technologies such as Xen and VMware have undeniable benefits. Virtual- 
ization improves the utilization of servers and reduces overprovisioning of hardware 
by consolidating system resources. By running your current software in a virtual 
environment, you can not only preserve your investment in that software but take 
greater advantage of low-cost, industry-standard servers. 


Hopefully, this chapter has provided you with the knowledge and skills you need to 
implement your own virtualized environments. You now have the opportunity to 
experiment and have fun with free virtualization technology. Doing so could posi- 
tion you as a specialist in a field few understand. 
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CHAPTER 10 
Scripting 














As a Linux system administrator, you'll use two tools more than any others: a text 
editor to create and edit text files, and a shell to run commands. At some point you'll 
tire of typing repetitive commands and look for ways to save your fingers and reduce 
errors. That’s when you’ll combine the text editor and the shell to create the sim- 
plest Linux programs: shell scripts. 


Linux itself uses shell scripts everywhere, especially for customizable tasks such as 
service and process management. If you understand how those system scripts are 
written, you can interpret the steps they’re taking and adapt them for your own 
needs. 


The shell (an interface to the operating system) is one of many innovations inherited 
from Linux’s great-grandfather, Unix. In 1978, Bell Labs researcher Stephen Bourne 
developed the Bourne Shell for Version 7 Unix. It was called sh (Unix valued terse- 
ness), and it defined the standard features that all shells still display. Shells evolved 
from that foundation, leading to the development of the Korn shell (ksh, or course), 
the C shell (csh), and finally the Bash shell (bash) that is now standard on GNU/ 
Linux systems. bash is a pun/acronym for Bourne-Again Shell, and it still supports 
scripts written for the original Bourne shell. 


This chapter starts with the bash basics: shell prompts, commands and arguments, 
variables, expressions, and I/O redirection. If you’re familiar with these already, you 
won't miss much by skipping ahead a few pages (except perhaps a cure for insomnia). 


Every tool has its limits, and at some point you may find that bash is not the best 
solution to all your problems. Toward the end of this chapter we’ll examine a small 
application written in a number of scripting languages: bash as well as Perl, PHP, and 
Python (the three Ps associated with the LAMP acronym we mentioned in 
Chapter 6). You can compare their style, syntax, expressiveness, ease of use, and 
applicability to different domains. Not every problem is a nail, but a big enough 
hammer can treat it like one. 





211 


bash Beginnings 


Many operating systems offered command-line interfaces in the early days, and they 
typically allowed commands to be stored in text files and run as batch jobs (a readily 
understood concept at the time). It soon became natural to introduce ways to sub- 
mit parameters to scripts and allow the scripts to change their behavior under differ- 
ent conditions. Unix’s shell made tremendous leaps in flexibility, turning the shell 
into a true programming language. 


Our interactive examples will show a sample shell prompt, a command with optional 
arguments, and the command’s output, like this: 


admin@server1:~$ date 
Thu Aug 24 09:16:56 CDT 2006 


We’ll show the contents of a shell script like this: 


#!/bin/bash 

contents of script... 
The first line is special in Linux scripts: if it starts with the two characters #!, the rest 
of the first line is the filename of the command to run to process the rest of the 
script. (If the # character is not followed by a |, it’s interpreted as a comment that 
continues until the end of the line.) This trick lets you use any program to interpret 
your script files. If the program is a traditional shell like sh or bash, the file is called a 
shell script. At the end of the chapter we’ll show scripts for Perl, PHP, and Python. 





We 
. Microsoft Windows uses the suffix of the filename to define the file 
ny type and what interpreter should run it. If you change a file’s suffix, it 
`~ (lè may stop working. In Linux, filenames have nothing to do with execu- 





tion (although following conventions can be useful for other reasons). 


Use your favorite text editor (or even one you don’t care for) to create this three-line 
file, and save it to a file called hello: 

#!/bin/bash 

echo hello world 

echo bonjour monde 
This file is not a working script yet. We’ll show how to actually run it in the next sec- 
tion, but first we need to explain some basic syntax rules. 


The /bin/bash shell will interpret this script line by line. It expects each command to 
be on a single line, but if you end a line with a backslash (\), bash will treat the next 
line as a continuation: 

#!/bin/bash 

echo \ 


hello\ 
world 
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This is a good way to make complex lines more readable. 


The shell ignores lines filled with whitespace (spaces, tabs, empty lines). It also 
ignores everything from a comment character (#) to the end of the line. When bash 
reads the second line of this script (echo hello world), it treats the first word (echo) 
as the command to run and the other words (hello world) as its arguments. The echo 
command just copies its arguments to its output. The third line runs another echo 
command, but with different arguments. 


To see what you’ve put in the file hello, you can print its contents to the screen: 


admin@server1:~$ cat hello 
#!/bin/bash 

echo hello world 

echo bonjour monde 


Pathnames and Permissions 


The hello file can be executed by running the bash command with a hello argument: 


admin@server1:~$ bash hello 
hello world 

bonjour monde 
admin@server1:~$ 


Now let’s try to run hello without its bash chaperon: 


admin@server1:~$ hello 

bash: hello: command not found 
Why can’t bash find it? When you specify a command, Linux searches a list of direc- 
tories called the path for a file of that name and runs the first one it finds. In this 
case, hello was not in any of these directories. If you tell the system what directory 
hello is in, it will run it. The pathname can be absolute (/home/admin/hello) or rela- 
tive (./hello means the hello file in the current directory). We’ll describe how to spec- 
ify the directories in your path in the next section, but first we have to deal with 
permissions. 


A shell script won’t run without certain file permissions. Let’s check the permissions 
on hello: 

admin@server1:~$ ls -1 hello 

-Yw-Y--Y-- 1 admin admin 48 2006-07-25 13:25 hello 
A - indicates that the flag is not set. The leading - is the directory flag; it's d for a 
directory or - for a file. Next come the permissions for the file's owner, the group to 
which the owner belongs, and everyone else. The owner (admin) can read (r) and 
write (w) this file, while others in the group (in this case, also named admin) and 
everyone else can only read it (r--). No one can execute (run) the file, because the 
third character in each three-character set is a - instead of an x. 
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Now let’s try to run hello with a relative pathname: 


admin@server1:~$ ./hello 

bash: ./hello: Permission denied 
This time Linux found it but didn’t run it. It failed because the hello file does not 
have executable permissions. You need to decide who will be allowed to execute it: 
only you (the owner), anyone in your group, and/or users in other groups. This is a 
practical security decision that administrators must make frequently. If permissions 
are too broad, others can run your script without your knowledge; if they’re too nar- 
row, the script might not run at all. 


The command to change permissions is called chmod (for change mode), and it can 

use old-style Unix octal numbers or letters. Let’s try it both ways, giving read/write/ 

execute permissions to yourself, read/execute permissions to your group, and noth- 

ing to others (what have they ever given you?). For the octal style, read=4, write=2, 

and execute=1. The user number will be 4+2+1 (7), the group 4+1 (5), and others 0: 
admin@server1:~$ chmod 750 hello 


admin@server1:~$ ls -1 hello 
-YwxY-X--- 1 admin admin 50 2006-08-03 15:44 hello 


The other style of permission arguments, using letters, is probably more intuitive: 


admin@server1:~$ chmod u=rwx,g=rx hello 

admin@server1:~$ ls -1 hello 

-YwxY-X--- 1 admin admin 50 2006-08-03 15:44 hello 
To quickly add read and execute permissions for yourself, your group, and others, 
enter: 

admin@server1:~$ chmod +xr hello 


admin@server1:~$ ls -1 hello 
-YWXY-XxXr-X 1 admin admin 50 2006-08-03 15:44 hello 


Now we can run the script from the command line: 


admin@server1:~$ ./hello 
hello world 
bonjour monde 


The Default Path 


The list of directories through which bash should search for commands is specified in 
a shell environment variable called PATH. To see what’s in your path, enter: 


admin@server1:~$ echo $PATH 
/bin:/usr/bin 


Linux reserves the special names . for the current directory and .. for the current 
directory’s parent directory. If you want Linux to always find commands like hello in 
your current directory, add the current directory to PATH: 


admin@server1:~$ PATH=$PATH: . 
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To make changes such as this one stick, you’ll need to make a permanent change to 
your PATH. This can be done by an individual user in the .bashrc file located in the 
user’s home directory, or by the system administrator in a system-wide startup file 
(usually located in the /etc directory); just add a statement to the file like the com- 
mand just shown. 


Alternatively, you could move the hello script to one of the directories already in the 
PATH. However, these directories are usually protected so that only the root user can 
put files there, to preserve security. 


For a script more complex than hello (i.e., almost any script), either method has 
security implications. If . is in your PATH, you run the risk that if someone else puts a 
different script named hello in another directory and you blunder into that directory 
and type hello, you’ll execute the other user’s hello and not the one you intended. 


The correctness of the script is also a concern. We’re reasonably sure about what our 
hello script does now, but we might not be after adding a hundred more lines. 


A common practice is to put your own scripts in a directory like /usr/local/bin or a pri- 
vate ~/bin rather than a system directory like /bin, /sbin, or /usr/bin. To add this direc- 
tory to your PATH permanently, add a line like the following at the end of your .bashrc 
file: 


export PATH=$PATH:/usr/local/bin 


1/0 Redirection 


I/O redirection and pipes are more Unix innovations that Microsoft and many others 
have copied without shame. The shell gives you access to these features in a very 
intuitive way. 


When you’re typing a command at the console or in a text window, your fingers pro- 
vide the command’s standard input, and your eyes read the command’s standard out- 
put and standard error output. However, you can produce input or capture that 
output by replacing your fingers or your eyes with a file. Let’s run the ls command 
with its standard output going to the screen as usual, and then redirected (with >) to 
a file: 

admin@server1:~$ ls 

hello 

admin@server1:~$ ls > files.txt 

admin@server1:~$ 
In the second example, the redirection happens silently. If any errors occurred, how- 
ever, you would see them on the display rather than in the file (that’s why standard 
error exists): 

admin@server1:~$ ls ciao > files.txt 


ls: ciao: No such file or directory 
admin@server1:~$ 
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Be aware that if files.txt exists before you run these commands, it will be overwrit- 
ten. If you want to append new content to the file rather than overwriting it, use the 
append (>>) characters instead: 


admin@server1:~$ ls -l >> files.txt 
If files.txt does not exist, it will be created before the appending starts. 


You can also redirect standard error. Here is a dazzling display that redirects both 
standard output and standard error at the same time: 


admin@server1:~$ ls -l > files.txt 2> errors.txt 


The inelegant 2> is the standard error redirection magic. Standard error redirection 
can be useful with long processes such as compilations, so you can review any error 
messages later rather than hovering over the screen. 


If you want to redirect standard output and standard error to the same file, do this: 
admin@server1:~$ ls -l > files.txt 2>&1 


The &1 means “the same place as standard output,” which in this case is files.txt. A 
shortcut for the previous command is: 


admin@server1:~$ ls -1 >& files.txt 
Use >> rather than > anywhere you want to append rather than overwrite. 


It’s only fair that standard input may also be redirected. Here’s a contrived example 
that searches for filenames containing the string foo: 

admin@server1:~$ ls -l > files.txt 

admin@server1:~$ grep foo < files.txt 

admin@server1:~$ rm files.txt 
The first step creates the temporary file files.txt. The second step reads from it, and in 
the third step we practice good disk hygiene and get rid of it. The temporary file’s life 
was short but productive. 


We can combine these three steps into one and avoid the temporary file with Unix’s 
best invention, the pipe. A pipe connects the output of one command to the input of 
another command. The pipe symbol is |, like a > and < meeting at great speed. The 
standard output of the first command becomes standard input for the second com- 
mand, simplifying our earlier steps: 


admin@server1:~$ ls -1 | grep foo 
You can also chain pipes together: 
admin@server1:~$ ls -1 | grep foo | wc -1 


This command will count the number of times the string foo appears in any of the 
files in the current directory. 
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Variables 


bash is a programming language, and programming languages have common fea- 
tures. One of the most basic is the variable: a symbol that contains a value. bash vari- 
ables are strings unless you specify otherwise with a declare statement. You don’t 
need to declare or define bash variables before you use them, unlike with many other 
languages. 


A variable’s name is a string starting with a letter and containing letters, numbers, or 
underscores (_). A variable’s value is obtained by putting a $ character before the 
variable’s name. Here’s a shell script that assigns a string value to the variable hw, 
then prints it: 


#!/bin/bash 
hw="hello world" 
echo $hw 


The variable hw is created by the assignment in line 2. In line 3, the contents of the 
variable hw will replace the $hw reference. Because bash and other shells treat 
whitespace characters (spaces and tabs) as command argument separators rather 
than normal argument characters, to preserve them you must surround the whole 
string with double quote (") or single quote (') characters. The difference is that shell 
variables (and other special shell syntax) are expanded within double quotes and 
treated literally within single quotes. Look at the difference in output from the two 
echo commands in the following script: 


admin@server1:~$ cat hello2 
#!/bin/bash 

hw="hello world" 

echo "$hw" 

echo '$hw' 

admin@server1:~$ ./hello2 
hello world 

$hw 

admin@server1:~$ 


You can assign the standard output of a command to a variable with the $(command) 
or ~command* (using little grave accents) syntax: 


admin@server1:~$ cat today 

#!/bin/bash 

dt=$(date) 

dttoo= date~ 

echo "Today is $dt" 

echo "And so is $dttoo" 
admin@server1:~$ ./today 

Today is Tue Jul 25 14:56:01 CDT 2006 
And so is Tue Jul 25 14:56:01 CDT 2006 
admin@server1:~$ 


Special variables represent command-line arguments. The $ character followed by a 
number n refers to the nth argument on the command line, starting from 1. The $0 





bash Beginnings | 217 


variable is the name of the script itself. The $* variable contains all the arguments as 
one string value. These variables can then be passed along to commands the script 
runs: 

admin@server1:~$ cat files 

#!/bin/bash 

Is -Alv $* 

admin@server1:~$ ./files hello hello2 today 

-YWXr-xr-x 1 admin admin 48 2006-07-25 13:25 hello 

-YWxr-xr-x 1 admin admin 51 2006-07-25 14:45 hello2 

-Ywxr-xr-x 1 admin admin 45 2006-07-25 14:49 today 

admin@server1:~$ 
The special variable $$ contains the current process’s process ID. This can be used to 
create a unique temporary filename. If multiple copies of the same script are running 
at the same time, each will have a different process ID and thus a different tempo- 
rary filename. 


Another useful variable is $?, which contains the return status of the most recent 
command executed. We’ll use this later in this chapter to check for the success or 
failure of program execution in a script. 


Useful Elements for bash Scripts 


We’ve introduced the basic elements of bash that you'll use in the everyday running 
of interactive commands. Now let’s look at some things that will help you write 
effective scripts. 


Expressions 


bash expressions contain variables and operators such as == (equals) and > (greater 
than). These are usually used in tests, which can be specified in several ways: 

test $file == "test" 

[ $file == "test" ] 

[[ $file == "test" ]] 
If you use the test command, remember that some symbols have multiple meanings 
(for instance, in an earlier section we used > for output redirection), so they need to 
be enclosed in quotes. You don’t have to worry about the quotes if you use the sin- 
gle or double square bracket syntax. The double brackets do everything the single 
ones do and a bit more, so it’s safest to use double brackets with your expressions. 


bash has some useful special built-in operators: 


-a file # true if file exists 

-d file # true if file exists and is a directory 
-f file # true if file exists and is a file 

-r file # true if file exists and is readable 

-w file # true if file exists and is writable 

-x file # true if file exists and is executable 
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Arithmetic 


bash is heavily weighted toward text such as commands, arguments, and filenames. 
It can evaluate the usual arithmetic expressions (using +, -, *, /, and other operators) 
by surrounding them with a pair of double parentheses: ((expression)). Because 
many arithmetic characters—including *, (, and )—are specially interpreted by the 
shell, it’s best to quote shell arguments if they will be treated as math expressions in 
the script: 


admin@server1:~$ cat arith 

#!/bin/bash 

answer=$(( $* )) 

echo $answer 

admin@server1:~$ ./arith "(8+1)*(7-1)-60" 
-6 

admin@server1:~$ ./arith "2**60" 
1152921504606846976 

admin@server1:~$ 


The latest version of bash supports 64-bit integers (—9223372036854775808 to 
9223372036854775807). Older versions support only 32-bit integers (with a puny 
range of —2147483648 to 2147483647). Floating-point numbers are not supported. 
Scripts that need floating-point or more advanced operators can use an external pro- 
gram such as bc. 


In arithmetic expressions, you can use variables without the $ character that would 
be used to substitute their values in other settings: 


admin@server1:~$ cat arithexp 
#!/bin/bash 

a=$1 
b=$(( a+2 )) 

echo "$a + 2 = $b" 
c=$(( a*2 )) 

echo "$a * 2 = $c" 
admin@server1:~$ ./arithexp 6 
6+2=8 

62% 2.12 

admin@server1:~$ 





If... 


Given expressions, you can execute different chunks of code depending on the 
results of tests. bash uses the if ... fi (backwards if) syntax, with optional elif 
(else if) and else sections: 
if expression1 ; then 
(commands) 


elif expression2 ; then 
(commands) 
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elif expressionN ; then 
(commands ) 

else (commands) 

fi 


The ; then phrase at the end of a line can also be expressed as a plain then on the 
next line: 


if expression 
then 

(commands ) 
fi 


If you’re in the same directory as the hello script you made earlier, try this: 


admin@server1:~$ if [[ -x hello ]] 
> then 

> echo "hello is executable" 

> fi 

hello is executable 
admin@server1:~$ 


Here’s a fancier script that searches the /etc/passwd file for an account name: 


#!/bin/bash 
USERID="$1" 
DETECTED=$( egrep -o "*$USERID:" < /etc/passwd ) 
if [[ -n "${DETECTED}" ]] ; then 
echo "$USERID is one of us :-)" 
else 
echo "$USERID is a stranger :-(" 
fi 


Let’s call this script friendorfoe, make it executable, and try it with first a known 
account on our system (root) and then a made-up account (sasquatch): 


admin@server1:~$ ./friendorfoe root 


root is one of us :-) 
admin@server1:~$ ./friendorfoe sasquatch 
sasquatch is a stranger :-( 


The first argument is assigned to the shell variable USERID. The egrep command is run 
within $() to assign its output to the DETECTED shell variable. egrep -o prints only the 
string it matches, rather than the whole line. "*$USERID:" matches the contents of the 
USERID variable only if the contents of the variable appear at the start of a line and are 
immediately followed by a colon. The if expression is surrounded with double square 
brackets to contain it, evaluate it, and return its result. The -n "${DETECTED}" expres- 
sion returns true if the shell variable DETECTED is a non-empty string. Finally, the vari- 
able DETECTED is quoted ("${DETECTED}") to treat it as a single string. 


Wherever the if statement takes an expression, you can put in a command, or even a 
sequence of commands. If the last command in the sequence succeeds, the if state- 
ment considers that the expression returned a true result. If the last command in the 
sequence fails, it’s considered that the expression returned a false result, and the else 
expression will be executed. We’ll see examples in upcoming sections. 
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Troubleshooting a Simple Script 


Let’s perform some surgery on a script that is supposed to delete its argument (a file 
or a directory) but has a few problems: 


admin@server1:~$ cat delete 
#!/bin/bash 
if xm $1 
then 
echo file $1 deleted 
else 
if rmdir $1 
then 
echo directory $1 deleted 
fi 
fi 


The script is intended to delete the file passed as an argument using rm, and to print 
a message if it succeeds. If rm fails, the script assumes the argument refers to a direc- 
tory and tries rmdir instead. 


Here are some results: 


admin@server1:~$ ./delete hello2 

file hello2 deleted 

admin@server1:~$ ./delete hello2 

ym: cannot remove ~hello2': No such file or directory 
rmdir: ~hello2': No such file or directory 
admin@server1:~$ mkdir hello3 

admin@server1:~$ ./delete hello3 

rm: cannot remove ~hello3': Is a directory 

directory hello3 deleted 

admin@server1:~ 





Using these error messages, let’s try to fix the script. First, we’ll use I/O redirection 
to save results to log and error files, which we can review in our copious free time. 
Next, we’ll catch the return value of the rm command to generate a success or fail- 
ure message. We’ll also capture the current date and time to include in the output 
log: 


admin@server1:~$ cat removefiles 
#!/bin/bash 
# removefiles deletes either files or directories 
echo "$0 ran at" $(date) >> delete.log 
if rm $1 2>> delete-err.log 
then 
echo "deleted file $1" >> delete.log 
elif rmdir $1 2>> delete-err.log 


then 

echo "deleted directory $1" >> delete.log 
else 

echo "failed to delete $1" >> delete. log 


fÈ 
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The script still has some warts: it doesn’t check if the file even exists, and it doesn’t 
distinguish between a file and a directory. We can use some of the built-in operators 
that we mentioned earlier to fix these problems: 


admin@server1:~$ cat removefiles 
#!/bin/bash 
# removefiles deletes either files or directories 
echo "$0 ran at" $(date) >> delete.log 
if [ ! -e $1 ] 
then 
echo "$1 does not exist" >> delete.log 
elif [ -f $1 ] 
then 
echo -n "file $1 " >> delete.log 
if rm $1 2>> delete-err.log 
then 
echo "deleted" >> delete.log 
else 
echo "not deleted" >> delete.log 
fi 
elif [ -d $1 ] 
then 
echo "directory $1 " >> delete.log 
if rmdir $1 2>> delete-err. log 
then 
echo "deleted" >> delete.log 
else 
echo "not deleted" >> delete.log 


fi 
fi 
This looks pretty good, but we have one more curve to throw you: what if the file or 
directory name contains spaces? (You’re guaranteed to see this if you get any files 
from Windows or Mac systems.) Create a file called my file, then try to delete it with 
our trusty script: 


admin@server1:~$ ./removefiles my file 
Then the last line of delete.log will contain: 
my does not exist 


Since we didn’t put quotes around my file, the shell split my and file into the script’s 
$1 and $2 variables. So, let’s quote my file to keep it in $1: 
admin@server1:~$ ./removefiles "my file" 
./removefiles: [: my: binary operator expected 
./removefiles: [: my: binary operator expected 
Oops. We got the string my file into the shell’s $1 variable, but we need to quote it 
again inside the script to protect it for the name tests and remove commands: 
admin@server1:~$ cat removefiles 


#!/bin/bash 
# removefiles deletes either files or directories 
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echo "$0 ran at" $(date) >> delete.log 
if [ ! -e "$4" ] 
then 
echo "$1 does not exist" >> delete.log 
elif [ -f "$4" ] 
then 
echo -n "file $1 " >> delete.log 
if rm "$1" 2>> delete-err.log 
then 
echo "deleted" >> delete.log 
else 
echo "not deleted" >> delete.log 
fi 
elif [ -d "$4" ] 
then 
echo -n "directory $1 " >> delete.log 
if rmdir "$1" 2>> delete-err.log 
then 
echo "deleted" >> delete.log 
else 
echo "not deleted" >> delete.log 


fi 
fi 


Now, at last, when you run the command: 
admin@server1:~$ ./removefiles "my file" 
the last line of delete.log will be: 
file my file deleted 


Loops 


If you want to do something more than once, you need a loop. bash has three fla- 
vors: for, while, and until. 


The lovely and talented for loop has this general appearance: 


for arg in list 

do 

commands 

done 
It executes the commands action (which can cover as many lines and separate com- 
mands as you want) specified between do and done for each item in list. When the 
commands run, they can access the current item from list through the variable $arg, 
The syntax may be a bit confusing at first: in the for statement you must specify arg 
without the dollar sign, but in the commands you must specify $arg with the dollar 
sign. 


Some simple examples are: 


admin@server1:~$ for stooge in moe larry curly 
> do 
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> echo $stooge 
> done 

moe 

larry 

curly 


admin@server1:~$ for file in * 

> do 

> ls -1 $file 

> done 

-Yw-Y--Y-- 1 admin admin 48 2006-08-26 14:12 hello 


admin@server1:~$ for file in $(find / -name \*.gif) 
> do 

> cp $file /tmp 

> done 


The while loop runs while the test condition is true: 


while expression 
do 


stuff 


done 


Here’s an example script that uses the arithmetic expressions mentioned earlier to 
create a C-style while loop (the indentation isn’t necessary, but we like it): 


#!/bin/bash 
MAX=100 
((cur=1)) # Treat cur like an integer 
while ((cur < MAX)) 
do 
echo -n "$cur 
((cur+=1)) # Increment as an integer 
done 


The until loop is the opposite of while. It loops until the test condition is true: 


until expression 
do 


stuff 


done 


An example is: 


#!/bin/bash 


gameover="q 
until [[ $cmd == $gameover ]] 


do 

echo -n "Your commmand ($gameover to quit)? " 
read cmd 

if [[ $cmd != $gameover ]]; then $cmd; fi 
done 
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To escape from a loop, use break. Let’s rewrite our until example as a while loop 
with a break: 


#!/bin/bash 
gameover="q" 
while [[ true ]] 
do 
echo -n "Your commmand ($gameover to quit)? " 
read cmd 
if [[ $cmd == $gameover ]]; then break; fi 
$cmd 
done 





To skip the rest of the loop and jump back to the start, use continue: 


#!/bin/bash 


gameover="q 
while [[ true ]] 

do 

echo -n "Your commmand ($gameover to quit)? " 

read cmd 

if [[ $cmd != $gameover ]]; then $cmd; continue; fi 

break 

done 


cron Jobs 


Shell scripts are often used to glue programs together. A common example in Linux 
is the definition of cron jobs. cron is the standard Linux job scheduler. If you want 
something to happen the third Tuesday of every month at the uncivilized hour of 
01:23, you can get cron to do it for you without any of the negative feedback that 
you would get from a person. The cron daemon checks every minute to see whether 
it’s time to do something, or if any cron job specifications have changed. 


You specify cron jobs by editing a crontab file. You can view the contents of your 
crontab, if any, as follows: 


admin@server1:~$ crontab -1 
no crontab for admin 


To edit your crontab, enter: 
admin@server1:~$ crontab -e 


Each line of a crontab file contains a day/time specification and a command, in this 
format: 


minute hour day_of_month month day_of_week command 
This requires more than a little explanation: 
e minute is between 0 and 59. 
e hour uses the 24-hour clock and is between 0 and 23. 


e day of month ranges from 1 to 31. 
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e month is a number between 1 and 12 or a name such as February. 


e day of_week is a number between 0 and 7 (0 or 7 is Sunday, 6 is Saturday) or a 
name such as Tuesday. 


e day of_month and day_of week are ORed together, which may cause surprises. 
For instance, if each field contains a 1, cron will execute the command in Janu- 
ary as well as on Mondays. Usually, the crontab line puts a specific value in only 
one of these fields. 


e In any field, a value means an exact match; for instance, a 1 in the month field 
means only January. 


e An asterisk (*) means any value. 


e Two values separated by a hyphen indicate a range. Thus, 11-12 in the month 
field means November through December. 


e To specify more than one value, separate the values with commas. A month list of 
2,3,5-6 means February, March, and May through June. 


e A step modifier may follow values and a slash (/), and it indicates how many 
units to increment between values. A month value of */3 means every third 
month. A month value of 4-9/2 means months 4, 6, and 8. 


The shell executes the command, so it can use the features mentioned in this chap- 
ter. Some examples using direct commands rather than scripts are: 

5 * * * * xm /tmp/*.gif # remove all GIF files every 5 minutes 

5 * * * * ym -v /tmp/*.gif >> /tmp/gif.log # the same, logged 
When cron runs the command, it emails its standard output and standard error to 
the owner of the crontab. To prevent being pelted with such emails, you can redirect 
the standard output and standard error to a place where the sun doesn’t shine: 


command > /dev/null 2>&1 


Scripting Language Shootout 


The main use of a shell is to run commands and expand filename patterns, and shells 
were designed to make these operations easy. Other tasks, such as performing arith- 
metic calculations, are harder, because their text needs to be protected from word 
splitting and * expansion. In complex shell scripts, the pile of parentheses, brackets, 
and other symbols begins to resemble a cartoon character swearing. 


In the old days (“We had zeroes and ones then, and we were lucky to have ones!”), 
how-to articles often featured long shell scripts to add users, download and build 
packages, back up files, and so on. Nowadays you may prefer to carry out these tasks 
using a more advanced scripting language, for several reasons: 


e Over time, applications such as adduser and apt-get have automated some tradi- 
tional shell-script tasks. 


e Shell scripts don’t scale well, and they get hard to maintain. 
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e Shell scripts run slower. 


° Shell syntax is icky. 


Perl initially filled the gap as administrators looked for more productive tools, but 
now PHP has migrated out of its web niche, and Python has gained a reputation for 
productivity. We’ll write one application in each of these languages; several others, 
such as Ruby and Tcl, are also available on Linux. 


Our application will search the /etc/passwd file for name, user ID, hat size, or what- 
ever else we can find in there. You'll see how to open a file, read records, parse for- 
mats, search for patterns, and print results. Then we’ll look at ways to avoid some of 
this work, because sweat != productivity. You’ll be able to apply these techniques to 
other files, such as logs or web pages. This is an example of data munging, and you’re 
probably doing a lot of it already. 


Let’s invent some requirements for our application and express them with this 
pseudocode: 


read a search string from the user 
open the places file 
for each line: 
parse the fields (columns) 
search the name field for a match 
if there's a match: 
print the other fields in a readable format 


By now, many programmers would have rushed in and started typing (some without 
having read the data format or requirements). Readers of this book are more disci- 
plined, though, as well as better looking. They’ve had to fix the messes that the other 
programmers have made and don’t want to make the same mistakes themselves. 


Data Format: The /etc/passwd File 


The password file usually contains standard system accounts such as the mighty 
root, application accounts such as apache, and user accounts. Here are snips of such 
a file: 


# System 
root:x:0:0:root:/root:/bin/bash 
bin:x:1:1:bin:/bin:/sbin/nologin 
daemon: x:2:2:daemon:/sbin:/sbin/nologin 


# Applications 
postgres :x:26:26:PostgreSOL Server: /var/lib/pgsql:/bin/bash 
apache: x:48:48:Apache: /var/www: /bin/false 


# Users 
adedarc:x:500:500:Alfredo de Darc:/home/adedarc:/bin/bash 
rduxover:x:501:501:Ransom Duxover: /home/rduxover: /bin/bash 
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cbarrel:x:502:502:Creighton Barrel: /home/cbarrel:/bin/bash 
cmaharias :x:503:503:C Maharias:/home/cmaharias:/bin/bash 
pgasquette:x:504:504:Papa Gasquette: /home/pgasquette: /bin/bash 
bfrapples:x:505:505:Bob Frapples:/home/bfrapples:/bin/bash 


The colon-separated fields are: 


e Account name 
e Encrypted password, or x if /etc/shadow is used 
e User ID (uid) 
e Group ID (gid) 
° Full name or description 
e Home directory 
e Shell 
Were interested in the fifth field (full name or description). In the ancient Unix 


scrolls, this was called the gecos field, for reasons that were obsolete even then. The 
name persists, and it’s useful to know. 


Script Versions 


We'll start each of the following sections with a minimal script that searches for a 
string anywhere in the /etc/passwd file and prints the matching line. We know this is 
too broad, but we want to get the script working before we get too fancy. 


Next, we’ll split the input lines into fields and restrict the pattern matching to the 
gecos field that contains our users’ names. 


Then we’ll further restrict the search to lines in which the value of the wid field is 
greater than 500. In our case, normal user IDs start at 501, so this will exclude sys- 
tem accounts and other automatons. 


By this point we’ll be pretty tired of the previous steps, so we’ll look for some tools 
that can do some of this work for us. 


The bash script 


Most languages provide function libraries for various tasks. Programs fill this role for 
the shell, and experienced shell scripters are familiar with the most useful Linux utili- 
ties (cat, head, tail, awk, cut, grep, egrep, and others). We’ll use some of these for our 
bash script. 


Here’s a quick and dirty version (finduser.sh) that reads the user’s search string as its 
argument, searches for a case-independent match anywhere on the line, and prints 
any matching line verbatim: 


#!/bin/bash 
grep -i "$1" /etc/passwd 
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admin@server1:~$ chmod +x finduser.sh 
admin@server1:~$ ./finduser.sh alf 
adedarc:x:500:500:Alfredo de Darc:/home/adedarc:/bin/bash 


This wasn’t any faster than just typing: 
admin@server1:~$ grep -i alf /etc/passwd 


But what if alf had also matched a system account named gandalf, or a string in some 
other field? If we want to restrict the search to the name field and to normal user 


accounts (i.e., accounts with user IDs greater than 500), our script is going to grow a 
bit. 


Digging through bash documentation reveals that bash can split its input on charac- 
ters other than whitespace, using its IFS variable. In the following version of the 
script, we read /etc/passwd line by line, splitting each line into field variables. If we 
find a match, we need to rebuild the line to print it in its original form: 


#!/bin/bash 
pattern=$1 
IFS=":" 
while read account password uid gid name directory shell 
do 
# Exact case-sensitive matches only! 
if [[ $name == $pattern ]]; then 
echo "$account: $password: $uid: $gid: $name: $directory:$shell" 
fi 
done < /etc/passwd 


But now we run into a problem with matching: unlike grep, bash does not have a 
built-in case-insensitive partial string match. We’ll have to put in more sophisticated 
pattern matching with an external helper, egrep: 


#!/bin/bash 
pattern=$1 
IFS=":" 
while read account password uid gid name directory shell 
do 
if [[ $(echo $name | egrep -i -c "$pattern") -gt o ]]; then 
echo "$account: $password: $uid: $gid: $name: $directory:$shell" 
fi 
done < /etc/passwd 


For our final script, let’s add our check on the uid numbers: 


#!/bin/bash 
pattern=$1 
IFS=":" 
while read account password uid gid name directory shell 
do 
# Exact matches only! 
if [[ $uid -gt 500 && $(echo $name | egrep -i -c "$pattern") -gt o ]]; then 
echo "$account: $password: $uid: $gid: $name: $directory:$shell" 
fi 
done < /etc/passwd 
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If you run a shell script with a -v or -x option, bash will print each command before 
executing it. This can help you see what the script is actually doing. 


The Perl script 


Perl is terse, and it’s really, really good at text. A Perl equivalent of our first bash 
script is: 
admin@server1:~$ perl -ne ‘print if /alf/i' /etc/passwd 


The /pattern/ matches pattern while the following i ignores case. Here’s an equivalent 
script version that we’ll use to beef up the program to meet our other requirements: 
#!/usr/bin/perl 
my $pattern = shift; 
while (<>) { 
if (/$pattern/i) { 
print; 
} 


} 


Many elements of Perl syntax are cryptic, but some are reminiscent of shell syntax 
(or other common Unix tools) and therefore not too hard to remember once you 
know those tools. In particular, you can see while and if statements in the previous 
script, and they behave as you might expect having learned about the shell equiva- 
lents. The <> syntax is also reminiscent of the < and > of shell redirection; it causes 
each iteration of the while loop to read one line of input. Note that unlike with bash, 
variables in Perl require the initial $ even when you’re assigning values. The print 
statement displays what <> finds. 


Perl has an alternative backward if syntax that saves a few characters: 


#!/usr/bin/perl 
my $pattern = shift; 
while (<>) { 

print if /$pattern/i; 
} 


The script (call it finduser.pl) assumes the password file is read from standard input, 
so you would run it like this: 


admin@server1:~$ ./finduser.pl alf < /etc/passwd 
The next version opens the password file directly: 


#!/usr/bin/perl 
my $fname = "/etc/passwd"; 
my $pattern = shift; 
open(FILE, $fname) or die("Can't open $fname\n"); 
while (<FILE>) { 

if (/$pattern/i) { 

print; 
} 


close(FILE); 
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To restrict matches to the name field as we did in the bash section, we play to Perl’s 
strengths: 
#!/usr/bin/perl 
my $fname = "/etc/passwd"; 
my $pattern = shift; 
open(FILE, $fname) or die("Can't open $fname\n"); 
while (<FILE>) { 
$line = $ 5 
@fields = split/:/; 
if ($fields[4] =~ /$pattern/i) { 
print $line; 
} 


close(FILE); 


An argument supplied by the user is read into the $pattern variable using the shift 
statement. The script also defines another kind of variable: an array named @fields. 
Perl’s split function puts each colon-separated element of a line into a single ele- 
ment of the array. We can then extract element number 4 (which is really the fifth 
element, because elements are numbered starting from 0) and compare it in a case- 
insensitive manner to the user’s argument. 


All of these scripts have involved reading text input lines and matching patterns. 
Because /etc/passwd is such an important file in Linux, you’d think someone would 
have automated some of this work by now. Fortunately, someone has: good old Perl 
provides a built-in function called getpwent that returns the contents of /etc/passwd a 
line at a time as an array of strings. In the following version of our script, we assign 
each field its own variable; in the subsequent version, we’ll use the array @list to 
hold all of them. In each case, we want the gecos field (called gcos in the Perl docu- 
mentation). Note that this is field 6 as returned by getpwent, not field 4, because 
getpwent supports two other fields that appear in the passwd files on some systems: 
#!/usr/bin/perl 
$pattern = shift; 
while (($name, $passwd, $uid,$gid, 
$quota, $comment, $gcos, $dir, 
$shell,$expire) = getpwent) { 
if ($gcos =~ /$pattern/i) { 
print "$gcos\n"; 


} 


#!/usr/bin/perl 
$pattern = shift; 
while (@fields = getpwent) { 
if ($fields[6] =~ /$pattern/i) { 
print "$fields[6]\n"; 
} 
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For our final bit of self-torture, let’s restrict searches to normal users (uid > 500). It’s 
an easy addition: 

#!/usr/bin/perl 

$pattern = shift; 

while (@fields = getpwent) { 


if ($fields[6] =~ /$pattern/i and $fields[2] > 500) { 
print "$fields[6]\n" 
} 


The PHP script 


PHP can be run by a web server (using CGI) or on its own (using the CLI). We’ll use 
the CLI version. If you don’t have the CLI version, you can install it on Debian-based 
systems with apt-get install php4-cli.” Our first PHP script will look like our early Perl 
scripts: 

#!/usr/bin/php 

<? 

$pattern = $argv[1]; 

$file = fopen("/etc/passwd", "r"); 

while ($line = fgets($file, 200)) { 

if (eregi($pattern, $line)) 
echo $line; 


Sears 

?> 
Thanks to its origin as an accompaniment to web pages, PHP makes the unusual 
assumption that the default content of the file to be interpreted is plain text, and that 
PHP code is recognized only between an opening <? or <?php tag and a closing ?> tag. 
It echoes text to standard output. The eregi function does a regular-expression com- 
parison in a case-insensitive manner. 


Because PHP has borrowed a lot from Perl, it’s not surprising that it has a split 
function: 


#!/usr/bin/php 
<? 
$pattern = $argv[1]; 
$file = fopen("/etc/passwd", "r"); 
while ($line = fgets($file, 200)) { 
$fields = split(":", $line); 
if (eregi($pattern, $fields[4])) 
echo $line; 


} 
fclose($file) ; 
?> 


* Or php5-cli, when it’s available. 
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But can we call a function like Perl’s getpwent to slice and dice the password file for 
us? PHP doesn’t appear to have an equivalent, so we’ll stick with the parsing 
approach to restrict the search to uid values over 500: 


#!/usr/bin/php 
<? 
$pattern = $argv[1]; 
$file = fopen("/etc/passwd", "r"); 
while ($line = fgets($file, 200)) { 
$fields = split(":", $line); 
if (eregi($pattern, $fields[4]) and $fields[2] > 500) 
echo $line; 


} 
fclose($file); 
?> 


The Python script 


Python scripts look different from Perl and PHP scripts, because statements are ter- 
minated with whitespace rather than C-style semicolons or curly braces. Tab charac- 
ters are also significant. Our first Python script, like our earlier attempts in the other 
languages, searches the password file and prints any line that contains the matching 
text: 


#!/usr/bin/python 
import re, sys 
pattern = "(?i)" + sys.argv[1] 
file = open("/etc/passwd") 
for line in file: 
if re.search(pattern, line): 
print line 


Python has namespaces (as does Perl) to group functions, which is why the functions 
in this script are preceded by the strings sys. and re.. This helps keep code modules 
a little more, well, modular. The "(?i)" in the third line of the script makes the 
match case-insensitive, similar to /i in Perl. 


The next iteration, which splits the input line into fields, involves a straightforward 
addition to the first: 


#!/usr/bin/python 
import re, sys 
pattern = "(?i)" + sys.argv[1] 
file = open("/etc/passwd") 
for line in file: 
fields = line.split(":") 
if re.search(pattern, fields[4]): 
print line 
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Python has an equivalent to Perl’s getpwent function that enables us to restrict the 
search to the field that contains names. Save the following script as finduser.py: 

#!/usr/bin/python 

import re, sys, pwd 

pattern = "(?i)" + sys.argv[1] 

for line in pwd.getpwall(): 

if re.search(pattern, line.pw_gecos): 
print line 


Now let’s see how it works: 


admin@server1:~$ ./finduser.py alf 
(‘adedarc', 'x', 501, 501, ‘Alfredo de Darc', '/home/adedarc', '/bin/bash') 


In this script, the line we printed was a Python list rather than a string, and it was 
pretty-printed. To print the line in its original format, use this: 

#!/usr/bin/python 

import re, sys, pwd 

pattern = "(?i)" + sys.argv[1] 

for line in pwd.getpwall(): 

if re.search(pattern line.pw_gecos): 
print ":".join(["%s" % v for v in line]) 

The last line is needed to turn each field into a string (pw_uid and pw_gid are inte- 
gers) before joining them into one long, colon-separated string. Although Perl and 
PHP let you treat a variable as a string or a number, Python is stricter. 


The final step is to restrict the searches to accounts with uid > 500: 
#!/usr/bin/python 
import re, sys, pwd 
pattern = "(?i)" + sys.argv[1] 
for line in pwd.getpwall(): 
if line.pw_uid > 500 and re.search(pattern line.pw_gecos): 
print ":".join(["%s" % v for v in line]) 


Choosing a Scripting Language 


The choice of a programming language, like the choice of a text editor or operating 
system, is largely a matter of taste. Some people find Perl unreadable, and others 
resist Python’s whitespace rules. Often the comparison goes no further; if you don’t 
like beets, why eat them? 


If you’re comfortable with the style of the language, the most important criterion is 
productivity for the task. bash is a quick way to create one-liners and short scripts, 
but it drags when scripts get over a hundred lines or so. Perl can be hard to read, but 
it’s powerful and has the benefit of the huge CPAN library. PHP looks like C, lacks 
namespaces, easily mingles code and output, and has some good libraries. Python 
may be the easiest to read and write, which is a special advantage for large scripts. 
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Further Reading 


Appendix contains some longer bash scripts that may be useful to system administra- 
tors. Linux Shell Scripting with Bash by Ken Burtch (Sams) and the Advanced Bash- 
Scripting Guide (http://www.tldp.org/LDP/abs/html) are good resources. If you ven- 
ture into the other scripting languages, any computer book with an animal on the 
cover should be a safe bet (unless you find Curious George Learns COBOL in the 
children’s section). 
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CHAPTER 11 


Backing Up Data 














Computers fail—disks break, chips fry, wires short circuit, and drinks dribble into 
the cases. Sometimes computers are stolen or are victims of human error. You may 
lose not only hardware and software but, more importantly, data. Restoring lost data 
takes time and money. In the meantime, your customers will be unhappy, and the 
government may take an interest if the data is needed for compliance with regula- 
tions. Making backup copies of all important data is cheap insurance against poten- 
tially expensive disasters, and business continuity requires a backup and recovery 
plan. 


In this chapter we’ll cover several tools for backing up data that can be useful in dif- 
ferent circumstances: 


rsync 
Sufficient for most user files; transfers files efficiently over a network to another 
system, from which you can retrieve them if disaster strikes the local system 


tar 
Traditional Unix program for creating compressed collections of files; creates 
convenient bundles of data that you can back up using other tools in this chapter 


cdrecord/cdrtools 
Records files to CD-Rs or DVDs 


Amanda 
Automates backups to tape; useful in environments with large amounts of data 


MySQL tools 
Provide ways to solve the particular requirements of databases 
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Backing Up User Data to a Server with rsync 


The most critical data to back up is data that is impossible, or very costly, to re-cre- 
ate. Usually this is user data that has grown over months or years of work. You can 
typically restore system data relatively easily by reinstalling from the original distribu- 
tion media. 


We'll focus here on making backups of user data from Linux desktop computers. A 
backup server needs enough disk space to store all of your user files. A dedicated 
machine is recommended. For a large office, disks may have a RAID (Redundant 
Array of Independent Disks) configuration to further protect against multiple failures. 


The Linux utility rsync is a copy program designed to replicate large quantities of 
data. It can skip previously copied files and fragments and encrypt data transfers 
with ssh, making remote backups with rsync faster and more secure than they are 
with traditional tools like cp, cpio, or tar. To check whether rsync is on your system, 
enter: 

# rsync --help 

bash: rsync: command not found 
If you see that message, you have to get the rsync package. To install it in Debian, 
enter: 


# apt-get install rsync 
Usually, you'll want your backups to preserve the original ownership and permis- 


sions. Thus, you'll need to ensure that all users have accounts and home directories 
on the backup server. 


rsync Basics 


The syntax of the rsync command is: 
rsync options source destination 

The major command-line options for rsync are: 

-a 
Archive. This option fulfills most of the previously mentioned requirements, and 
it’s easier to type and pronounce than its equivalent, -Dgloprt. 


Make backup copies of already existing destination files instead of replacing 
them. You usually won’t want to use this option unless you want to keep old 
versions of every file. It can result in the backup servers being filled up very 
quickly. 

-D 
Preserve devices. This option is used when replicating system files; it is not 
needed for user files. Works only when rsync is run as root. Included in -a. 
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-8 
Preserve the group ownership of files being replicated. This is important for 
backups. Included in -a. 


Preserve hard links. If two names being replicated refer to the same file inode, 
this preserves the same relationship in the destination. This option slows down 
rsync somewhat, but its use is recommended. 


Copy symlinks as symlinks. You’ll almost always want to include this option; 
without it, a symlink to a file would be copied as a regular file. Included in -a. 


Dry run: see what files would be transferred, but don’t actually transfer them. 


Preserve the user ownership of files being replicated. This is important for back- 
ups. Included in -a. 

-P 
Preserve file permissions. This is important for backups. Included in -a. 

-P 
Enable --partial and --progress. 

--partial 
Enable partial file transfers. If rsync is aborted, it will be able to complete the 
remainder of the file transfer when it resumes later. 


--progress 
Display file transfer progress. 


Enable recursion, transferring all subdirectories. Included in -a. 


--rsh='ssh' 
Use SSH for file transfer. This is recommended because the default transfer pro- 
tocol (rsh) is not secure. You can also set the RSYNC_RSH environment variable to 
ssh to get the same effect. 


Preserve the modification times on each file. Included in -a. 
List the files being transferred. 


Like -v, but also list the files being skipped. 
-vvv 
Like -vv, but also print rsync debugging info. 
-Z 
Enable compression; more useful over the Internet than on a high-speed LAN. 
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There are many more rsync options that may come in useful in specialized situa- 
tions. You can find these on the manpage. 


After the options, the arguments are the source and destination. Both source and 
destination can be paths to local files on the computer where rsync is running, rsync 
server designations (generally used for download file servers), or user@host:path des- 
ignations for ssh. Because rsync takes so many options and long arguments that 
won't regularly change, next we’ll write a bash script to run it. 


Making a User Backup Script 


This section presents a simple bash script that makes a backup from a user’s desktop 
to the backup server. The name of the backup server is assigned to the variable dest 
in this script. The variable user is assigned the username of the account that runs the 
script by running the whoami command and capturing the output as a string. The cd 
command changes the current directory to the user’s home directory. The logical-OR 
test condition that follows the cd command aborts the script if there is a failure. The 
one dot (.) all by itself specifies the current directory as the source argument. For the 
destination argument, we specify the username and hostname to log in as via ssh, 
followed by a dot to specify the current home directory on the destination host. 


Here’s the script: 


#!/bin/bash 

export RSYNC_RSH=/usr/bin/ssh 
dest=backup1 

user=$(whoami ) 

cd || exit 1 

rsync -aHPvz . "${user}@${dest}:." 


The RSYNC_RSH environment variable contains the name of the shell that rsync will 
use. The default is /usr/bin/rsh, so we change it to /usr/bin/ssh here. Running this 
script replicates all the files in the home directory of the user who runs it into that 
user’s home directory on the backup server. Let’s take a look at how this works by 
running it for our sample user (after logging into her desktop): 


amy@desk12:~$ ./backup 
Password: 
building file list ... 
14 files to consider 
./ 
new-brochure. sxw 

37412 100% 503.91kB/s 0:00:00 (1, 62.5% of 16) 
sales-plan-2006-08.sxw 

59513 100% 1.46MB/s 0:00:00 (2, 68.8% of 16) 
sales-plan-2006-09.sxw 

43900 100% 691.47kB/s 0:00:00 ( 
sales-plan-2006-10.sxw 

41285 100% 453.00kKB/s 0:00:00 (4, 81.2% of 16) 


w 


, 75.0% of 16) 
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vacation-request.sxw 
15198 100% 154.60kB/s 0:00:00 (5, 87.5% of 16) 


sent 185942 bytes received 136 bytes 24810.40 bytes/sec 

total size is 210691 speedup is 1.13 

amy@desk12:~$ 
rsync tells us that it is considering 14 files. It backs up only five files, though, because 
the other nine files are already on the backup server and have not been changed. This 
output shows the progress output as 100 percent when the files are complete and 
indicates how long each transfer took. On a high-speed LAN the transfer time will 
usually be less than one second for small or medium-sized files. On slower connec- 
tions or for very large files, you will see a progress status that gives the size and per- 
centage transferred so far, and an estimate of time to completion. 


Listing Files on the Backup Server 


rsync can also provide a list of the files on the backup server. This is useful for verify- 
ing whether new and important files are really there, as well as for finding files that 
need to be restored because they’ve been lost or because the user needs to recover an 
old version. 


To get this listing, omit the options and destination arguments. Here’s a simple bash 
script that obtains the desired results: 

#!/bin/bash 

dest=server1 

user=$(whoami ) 


cd || exit 1 
rsync "${user}@${dest}:." | more 


Running this script produces results similar to the following: 


amy@deski2:~$ ./backlist 








Password: 

drwx------ 4096 2006/08/09 13:20:41 . 

-IW------- 10071 2006/08/09 12:35:21 .bash history 
-IW-Y--Y-- 632 2006/07/27 23:03:06 .bash_profile 
-rYw-r--r-- 1834 2006/07/26 19:59:08 .bashrc 

-IWXI-XIY-X 108 2006/07/27 23:06:51 .path 

-IWXI-XI-X 79 2006/08/09 13:18:34 backlist 

-IWXI-XY-X 137 2006/08/09 13:19:29 backrestore 

-IWXI-XI-X 88 2006/08/09 13:03:46 backup 

-Yw-Y--r-- 37412 2006/07/17 14:40:52 new-brochure.sxw 
-IYw-Y¥--Y-- 59513 2006/07/19 09:16:41 sales-plan-2006-08.sxw 
-rYw-Y--r-- 43900 2006/07/19 22:51:54 sales-plan-2006-09.sxw 
-IYW-Y¥--Y-- 41285 2006/07/17 16:24:19 sales-plan-2006-10.sxw 
-rYw-r--r-- 15198 2006/07/10 14:42:23 vacation-request.sxw 
drwx------ 4096 2006/08/09 13:12:25 .ssh 





amy@desk12:~$ 
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Restoring Lost or Damaged Files 


No backup system is any good if lost files cannot be restored. Not only must we 
must be ready in case disaster strikes, but we must also test our recovery and restora- 
tion plans to make sure they will work when they are most needed. 


Our restoration script is just slightly more complicated than the previous script. 
We've added a way to specify individual files to be restored: 

#!/bin/bash 

dest=server1 

user=$(whoami ) 

cd || exit 1 

for file in "$@" ; do 

rsync -aHPvz "${user}@${dest}:./${file}" "./${file}" 

done 
To restore files, we simply run the script, passing the names of the files to be restored 
as arguments on the command line. In the following example, we will intentionally 
remove one of our files and then watch it be restored: 

amy@desk12:~$ rm sales-plan-2006-10.sxw 

amy@desk12:~$ ./backrestore sales-plan-2006-10.sxw 

Password: 

receiving file list ... 

1 file to consider 

sales-plan-2006-10.sxw 

41285 100% 6.56MB/s 0:00:00 (1, 100.0% of 1) 


sent 42 bytes received 39299 bytes 6052.46 bytes/sec 
total size is 41285 speedup is 1.05 
amy@desk12:~$ 


We can also restore all the files at once by using a dot as the filename. 


Automated Backups 


Backups can be automated using scripts similar to these run as cron jobs (discussed 
in Chapter 10). SSH requires the user’s password to be entered, so you'll need to 
include your users’ public keys in their SSH configurations in order to make the SSH 
logins work when the users are not present (say, nightly at 3 A.M.). 


You have many options for creating backups. You might want to run a cron job 
script on the server daily or weekly, to make backups on another server. Businesses 
with remote offices may want to make regular backups of data from those offices 
over the Internet. Backups can also be burned to CD-Rs, DVDs, or tape, to make 
long-term archival copies that can be transported offsite. 
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tar Archives 


The tar command creates an archive file from one or more specified files or directo- 
ries. It can also list the contents of an archive, or extract files and directories from an 
archive. A tar archive file is also known as a tarfile or a tarball. 


A tar archive file offers several advantages over a directory of separate files. For 
example, it makes sending a whole directory by email a lot easier. Directories con- 
taining lots of similar files can be compressed more efficiently when the compression 
operates on all the data in a single file. 


A common use for a tar archive is to aid in the distribution of the source program 
files for free or open source software. In most cases, the tar archives are compressed 
with the gzip or bzip2 programs. However, if all the files being archived are already 
compressed (which is usually true of audio, video, and OpenOffice.org files), com- 
pressing the archive itself will not have much benefit. 


You can name a tarred file anything you want, but certain file extensions are conven- 
tionally used to tell recipients how to unpack the file. The most common extensions 
are: 


.tar 
For uncompressed tar archives 


.tar.gz or .tgz 
For tar archives that have been compressed with the gzip compression program 


.tar.bz2 or .tbz 

For tar archives that have been compressed with the bzip2 compression program 
The syntax of a tar command is: 

tar options arguments 
The options are traditionally given as single letters without a dash (-) character, 
although many versions of tar also accept a dash. The most useful options are: 
-b 

Specify the block size (the default is units of 512 bytes). 


Create (write) a new archive. 


-f filename 
Read from or write to the archive filename. If filename is omitted or is -, the 
archive file is written to standard output or read from standard input. 

| 
Compress or uncompress the archive using bzip2 or bunzip2. Archives com- 
pressed with bzip2 usually have the .bz2 suffix. 
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Preserve file permissions. 


List the files in an existing archive. 


-v 
When creating or unpacking archives, list the contents. With the -t option, pro- 
vide more detail about the listed files. 

-xX 
Extract (read) files from an existing archive. 

-Z 


Compress or uncompress the archive using gzip or gunzip. Archives compressed 
with gzip usually have the .gz suffix. 


Creating a New Archive 


You can create a tar archive just to save a group of files for your own archiving pur- 
poses, to send them to someone else by email, or to make them available to the pub- 
lic (for example, on an FTP server). Some typical commands to archive the directory 
work-docs are as follows: 


e To create the archive work-docs.tar from the directory work-docs: 
$ tar -cf work-docs.tar work-docs 

e To create the compressed archive work-docs.tar.gz from the directory work-docs: 
$ tar -czf work-docs.tar.gz work-docs 

e To create the compressed archive work-docs.tar.bz2 from the directory work-docs: 


$ tar -cjf work-docs.tar.bz2 work-docs 


Extracting from an Archive 


At different times, you may need to extract files from an archive you created earlier 
(such as a backup), from an archive someone has mailed to you, or from an archive 
you have downloaded from the Internet (say, the source code for some software you 
need). 


Before you extract an archive, you should list and review its contents. You don’t 
want to accidentally replace existing files on your system with files from the archive, 
nor do you want to wind up with a mess of files that you have to clean up. 


Files in an archive should be arranged inside a directory, but not everyone does this, 
so you need to be careful to avoid extracting files into your current directory. It is 
usually a good idea to create a new directory on your computer in which to extract a 
tar archive. This keeps the extracted files apart from your other files, so they don’t 
get mixed up. It can also prevent the extraction from overwriting existing files. 
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The -t option lists the names of the files in the archive and the directories they’ll be in 
when the archive is unpacked. Adding the -v option increases the verbosity to give 
details about each file in the tar archive, including the size of each file and its last 
modification time. Here are some example commands: 


To list the files in the archive collection.tar: 

$ tar -tf collection.tar 
e To list the files in the archive collection.tar.bz2 with extra details: 
$ tar -tvjf collection.tar.bz2 


To extract the files in collection.tar into the current directory, while preserving 
the original permissions: 

$ tar -xpf collection.tar 
The -x option extracts the files into the current directory. tar works silently 
unless the -v option is also used to list the files. The -p option preserves the origi- 
nal permissions, so the extracted files will have the same permission settings as 
the files that were archived. 


e To extract the files in collection.tar.gz into the current directory, while preserv- 
ing the original permissions: 
$ tar -xpzf collection.tar.gz 
e To extract the files in collection.tar.bz2 into the current directory, while preserv- 
ing the original permissions: 
$ tar -xpjf collection.tar.bz2 


e To list and extract the files in collection.tar.bz2 into the current directory, while 
preserving the original permissions: 
$ tar -xpvjf collection.tar.bz2 


A Complete Example of Packing and Unpacking with tar 


The following shell session demonstrates the creation of a tar archive from a direc- 
tory of files: 


amy@desk12:~$ ls -dl monthly-reports 


drwxr-xr-x 2 amy amy 4096 2006-08-11 14:15 monthly-reports 
amy@desk12:~$ ls -1 monthly-reports 

total 228 

-YW-Y--Y-- 1 amy amy 50552 2006-05-09 11:09 mr-2006-04.sxw 
-YW-Y--Y-- 1 amy amy 51284 2006-06-06 15:44 mr-2006-05.sxw 
-YW-Y--r-- 1 amy amy 51428 2006-07-06 14:30 mr-2006-06.sxw 
-YW-Y--Y-- 1 amy amy 54667 2006-08-07 10:06 mr-2006-07.sxw 


amy@desk12:~$ tar -czf monthly-reports-aug.tar.gz monthly-reports 
amy@desk12:~$ ls -1 monthly-reports-aug.tar.gz 
-rw-r--r-- 1 amy amy 199015 2006-08-14 12:46 monthly-reports-aug. tar.gz 





The following shell session demonstrates listing the contents of the tar archive: 


amy@desk12:~$ ls -1 monthly-reports-aug.tar.gz 
-rw-r--r-- 1 amy amy 199015 2006-08-14 12:46 monthly-reports-aug. tar.gz 
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amy@desk12:~$ tar -tzf monthly-reports-aug.tar.gz 

monthly-reports/ 

monthly-reports/mr-2006-04.sxw 

monthly-reports/mr-2006-05.sxw 

monthly-reports/mr-2006-06.sxw 

monthly-reports/mr-2006-07.sxw 

amy@desk12:~$ tar -tvzf monthly-reports-aug. tar.gz 

drwxr-xr-x amy/a O 2006-08-11 14:15:12 monthly-reports/ 

-rw-r--r-- amy/a 50552 2006-05-09 11:09:12 monthly-reports/mr-2006-04.sxw 
-rw-r--r-- amy/a 51284 2006-06-06 15:44:33 monthly-reports/mr-2006-05.sxw 
-rw-r--r-- amy/a 51428 2006-07-06 14:30:19 monthly-reports/mr-2006-06.sxw 
-rw-r--r-- amy/a 54667 2006-08-07 10:06:57 monthly-reports/mr-2006-07.sxw 
amy@desk12:~$ 
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The following shell session demonstrates extracting the contents of a tar archive: 


amy@desk12:~$ mkdir extract.dir 

amy@desk12:~$ cd extract.dir 

amy@desk12:~/extract.dir$ tar -xzf ../monthly-reports-aug. tar.gz 
amy@desk12:~/extract.dir$ tar -xvzf ../monthly-reports-aug. tar.gz 
monthly-reports/ 

monthly-reports/mr-2006-04.sxw 

monthly-reports/mr-2006-05.sxw 

monthly-reports/mr-2006-06.sxw 

monthly-reports/mr-2006-07.sxw 

amy@desk12:~/extract.dir$ tar -xvvzf ../monthly-reports-aug.tar.gz 





drwxr-xr-x amy/amy O 2006-08-11 14:15:12 monthly-reports/ 

-rw-r--r-- amy/amy 50552 2006-05-09 11:09:12 monthly-reports/mr-2006-04.sxw 
-rw-r--r-- amy/amy 51284 2006-06-06 15:44:33 monthly-reports/mr-2006-05.sxw 
-rw-r--r-- amy/amy 51428 2006-07-06 14:30:19 monthly-reports/mr-2006-06.sxw 
-rw-r--r-- amy/amy 54667 2006-08-07 10:06:57 monthly-reports/mr-2006-07.sxw 


amy@desk12:~/extract.dir$ cd 
amy@desk12:~$ 


Summary 

The most important things to remember about tar are: 
e -c reads from your files and creates (writes to) a tar file. 
e -x extracts (reads from) a tar file and writes to your files. 


Most Unix and Linux administrators have mixed up these options at least once. 


Saving Files on Optical Media 


Recordable CD and DVD media, called CD-Rs, DVD-Rs, and DVD+Rs, allow you to 
save files in a convenient and compact form. They can be used for making backups 
that can be stored offsite, and for distributing software or data to users or custom- 
ers. A CD-R can hold upwards of 700 MB of data, while a DVD-R or DVD+R can 
hold upward of 4.7 GB. A dual-layer version of DVD+R also exists, with a capacity 
of 8.55 GB. 
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The difference between a DVD-R and a DVD+R is the technology used to locate the 
laser into the track groove for recording. The two methods are incompatible, so if 
your drive supports only either DVD-R or DVD+R, you must use matching record- 
able media. (Drives do exist that support both, allowing the use of either recordable 
DVD media type.) 


Recording files on a CD or DVD is not as straightforward or flexible as saving files on 
a hard disk. Rewritable media can get around some of the limitations, but they have 
a higher cost and reduced compatibility. In this section, we’ll focus on saving files on 
CD-Rs. The methods for DVDs are similar. 


A data CD consists of an array of sectors of 2048 bytes each. A special filesystem 
known as ISO-9660 is used to organize the files on the CD so that it can be read on a 
wide range of computers and other devices. Newer CD music players also support 
data CDs written in the ISO-9660 format, so they can access music files in com- 
pressed formats such as MP3. DVDs use a newer filesystem called Universal Disk 
Format (UDF). 


To record data, all CD and most DVD recorders require that the data be streamed to 
the drive continuously. If the data cannot be made available when the laser is trying 
to record it, the laser will have to stop, which breaks up the continuity of the record- 
ing. The methods used to record a CD were designed for slower computer systems, 
to maximize the reliability of these recordings. Today’s faster computers still face the 
challenge of providing data nonstop to today’s faster recording devices; however, 
many recorders now support Buffer Underrun Free technologies that enable them to 
continue the writing process even if the data buffer becomes empty at some point. 


The files to be recorded are typically first collected into a file called an ISO image file, 
which usually has the extension .iso. This file is then recorded directly to the CD-R. 
It is possible to record files directly to a CD-R without creating an .iso file first, but 
this method increases the risk that something else running on your computer could 
slow things down at the wrong time. 


The software needed to record a CD or DVD on Linux is located in a package called 
cdrecord (note that this package is undergoing a name change to cdrtools). If this 
package is not yet installed on your system, you should install it now using the meth- 
ods you have already learned. On Debian Sarge, you would run the command: 


# apt-get install cdrecord mkisofs 


Debian 4.0 forked the cdrecord package to one called wodim. Other packages include 
dvd+rw-tools (described at http://www.debianhelp.co.uk/burningdvd.htm) and K3b 
(http://www.k3b.org). 
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Accessing Your CD-R Drive 


Linux supports recording on IDE ATAPI CD-R drives through a special driver called 
ide-scsi. Most Linux distributions also include this driver in the kernel. If your sys- 
tem does not have the driver, you will need to load the driver module (installing it if 
needed), or possibly recompile your kernel. 


The ide-scsi driver emulates a SCSI device for software that is designed just for SCSI 
devices. Your IDE ATAPI CD drive and DVD drive will appear as if they are SCSI 
devices when the ide-scsi driver is active. 


The following command will list the SCSI devices on your system, so you can locate 
the emulated SCSI device number for your CD-R drive. It may list other devices as 
well, including any real SCSI devices if your computer actually has them. Run the 
command as root: 


# cdrecord -scanbus 
The output might look similar to this: 


Cdrecord-Clone 2.01 (i686-pc-linux-gnu) Copyright (C) 1995-2004 J&#246;rg Schilling 
scsidev: 'ATA' 

devname: 'ATA' 

scsibus: -2 target: -2 lun: -2 

Linux sg driver version: 3.5.27 

Using libscg version 'schily-0.8'. 


scsibus1: 
1,0,0 00) 'SONY ' "CD-RW CRX195E1 ' 'ZYS5' Removable CD-ROM 
1,1,0 01) 'DVD-16X ' 'DVD-ROM BDV316E ' '0052' Removable CD-ROM 
1,2,0 02) * 
1,3,0 103) * 
1,4,0 104) * 
1,5,0 105) * 
1,6,0 06) * 
1,7,0 07) * 





Look for the device description that matches your CD-R recorder. If you have more 
than one device, the brand name and model should help identify the correct device. 
The output should at least list CD-R or CD-RW in the description. In this example, 
our CD recorder is on emulated SCSI device 1,0,0. 


If the ide-scsi driver is not installed or not active, you may get output like this: 


Cdrecord-Clone 2.01 (i686-pc-linux-gnu) Copyright (C) 1995-2004 J&#246;rg Schilling 
cdrecord: No such file or directory. Cannot open '/dev/pg*'. Cannot open SCSI driver. 
cdrecord: For possible targets try ‘cdrecord -scanbus'. 

cdrecord: For possible transport specifiers try ‘cdrecord dev=help'. 

cdrecord: 

cdrecord: For more information, install the cdrtools-doc 

cdrecord: package and read /usr/share/doc/cdrecord/README.ATAPI.setup . 


If you get this kind of output, you will need to activate the ide-scsi driver before 
doing the actual recording step. 
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Setting Defaults 


A number of cdrecord parameters can be configured. For instance, you can configure 
cdrecord to recognize names for recording devices (so you don’t have to memorize 
the device numbers), and you can designate a default device. To configure cdrecord, 
log in as (or use su - to switch to) root. Then create a text file with your editor: 


# vi /etc/default/cdrecord 


We will put the following lines of text in this file to match the devices shown in our 
previous cdrecord -scanbus output. You will need to change these values to match the 
values for your own devices. Use any names you choose in place of cd and dvd. The 
whitespace between the fields on each line must be tabs, not spaces: 

CDR_DEVICE=cd 


cd=1,0,0 -1 -1 
dvd=1,1,0 -1 -1 ii 


If your Linux kernel is Version 2.6, you will most likely need to specify the device 
with the prefix ATA:, due to a redesign of the driver. In this case, the configuration 
file may look like this: 

CDR_DEVICE=cd 

cd=ATA:1,0,0 -1 -1 wt 

dvd=ATA:1,1,0 -1 -1 ae 
You can also set the default recording speed for each device, right after the device 
number. -1 indicates that the default value should be used. The next number is the 
FIFO buffer size; once again, -1 specifies the default on the Linux system. The last 
item on the line allows you to pass a driver-specific option; we left it as an empty 
string. 


Newer versions of cdrecord support the option driveropts=burnfree to protect 
against buffer underruns. 


Preparing Files to Record on a CD-R 


The mkisofs command creates an ISO filesystem image file. It should contain all the 
files to be recorded on the CD-R. There are a lot of options for this command, but 
these are the important ones that we will use: 


i Include Joliet names for Windows compatibility. 
-r 
Include Rock Ridge names for Unix/Linux compatibility. 
-v 
Set verbose mode to show the progress status. 
-V id_string 


Specify a volume ID to name the disc to be created. 
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-o filename 
Specify the filename of the ISO image being created. 


Here is a sample command to include all the files from a specified directory: 
# mkisofs -JrvV "disc name" -o backup.iso /home/amy 


You will see a lot of output from this command. The output is useful for large file 
collections to indicate an estimate of how much time remains. If you prefer not to 
have this output, omit the -v option from the command. 


Recording the CD-R 


You can now record a CD-R with the ISO image you created. To perform the actual 
recording, log in as (or use su - to switch to) root. Root permissions are needed by 
the cdrecord program to access the raw SCSI layer, to modify process priorities, and 
to lock buffer space into RAM to avoid swapping. CD writing has critical timing 
dependencies, so it helps to keep the rest of the system as idle as possible. 


If you are using a rewritable CD-RW disc in a CD-RW drive, you need to erase 
(blank) the CD-RW before doing the recording: 


# cdrecord blank=fast padsize=63s -pad -dao -v -eject 


Wx 
Some drives require the media to be ejected to reset the drive for the 
as next operation. Unless you have discovered that your drive does not 


aed @ X A 5 
` ‘#8; need this, use the -eject option, as shown here. 








To record the ISO image created in the previous section, enter: 
# cdrecord padsize=63 -pad -dao -v -eject backup.iso 
Avoid doing any other work on a computer that is recording a CD or DVD. 


Some modern drives have special features such as burnfree that help avoid problems 
when the computer is not operating fast enough. Discs recorded with these fix-ups 
taking place may not be compatible with some older devices. If you find that your 
recordings sometimes fail, do them at a slower speed. You can change the speed by 
including the speed= option, which is documented in the cdrecord manpage. Slowing 
down the recording speed may be particularly important if the image file being 
recorded is on a network filesystem. 


Padding is necessary for some IDE ATAPI CD readers to work correctly with read- 
ahead operations that Linux and other systems usually do. You may find that omit- 
ting the padding works with newer drives, but because the problem occurs during 
reading, you should include padding to ensure that older drives will be able to read 
the CD-Rs you record. Otherwise, you may find that your critical backup files are 
not readable on a temporary replacement computer. 
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Verifying the Recording 


After you’ve recorded a CD or DVD, it’s a good idea to verify that the recording 
reads back correctly. The media may be defective, or the computer may have been 
bumped during the recording, causing the laser to be moved out of the groove. 


The correct way to verify a recording is to either compare the sectors recorded with 
the sectors on the hard disk or generate checksums of those sectors and compare 
them. Both methods must be used only with the actual data sectors, not the padding 
sectors. The following bash shell script makes this verification easy when the origi- 
nal ISO image file is available: 


#!/bin/bash 
if [[ $# -1t 1 ]] ; then 
echo “usage: isomd5 <file_or_device> ..." 1>8&2 
exit 1 
fi 
for name in "$@" ; do 
isoinfo -di "${name}" 1>/dev/null || exit 1 
done 
for name in "$@" ; do 
count=( $( isoinfo -di "${name}" \ 
| egrep "Volume size is: " ) ) 
count="${count[3]}" 
bsize=( $( isoinfo -di "${name}" \ 
| egrep "“Logical block size is: " ) ) 
bsize="${bsize[4]}" 
md5=$( dd 
if="${name}" 
ibs="${bsize}" 
obs=4096 count="${size}" 
2>/tmp/isomd5 .$$.err 
| mdSsum ) 
if [[ $? != 0 J] ; then 
cat /tmp/isomd5.$$.err 
rm -f /tmp/isomd5.$$.err 


C an a 


exit 1 
fi 
rm -f /tmp/isomd5.$$.err 
echo "${md5:0:32}" "" "${name}" 
done 


This script works by obtaining the number of sectors used by the ISO filesystem in 
the image file. It limits the number of sectors read into the MD5 checksum hashing 
program to exactly the number used. This avoids reading any padding sectors, which 
could vary in number. 


We call this script isomd5. Give it the name of the ISO image file, as well as the name 
of the CD device normally used to read CD-Rs (with the newly recorded CD-R rein- 
serted). You should get a result similar to this: 


amy@desk12:~$ isomd5 backup.iso /dev/sro 
d41d8cd98f00b204e9800998ecf8427e backup.iso 
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Th 


d41d8cd98f00b204e9800998ec f8427e /dev/sr0 
amy@desk12:~$ 


e checksum from the MD5 program is the 32-character hexadecimal part. If it is 


not the same for both the ISO image file and the contents of the CD-R drive, the 
recording is defective. 


A failed recording is derisively called a “coaster.” You can use it to protect your cof- 


fee 


table from unsightly rings, but unlike a real drinks coaster, it'll explode into a 


shower of sharp fragments and sparks in a microwave. 


When a write to disc fails, try in turn: 


1 
2 
3 


. Repeating the recording with another blank disc 
. Recording at a slower speed 
. Using a different batch or different brand of blank discs 


If failures persist, you may have a defective recording drive. 








DVD Backups 


The steps shown in this section are specific to CD media, but DVD media can be 
recorded in similar ways, using the same software in the cdrecord or cdrtools package. 
Some DVD media—notably, the rare DVD-RAM—can operate much like hard drives, 
but these require a special drive that supports this mode of operation. 








Backing Up and Archiving to Tape with Amanda 


Tape is still a popular backup medium. The Advanced Maryland Automated Net- 
work Disk Archiver (Amanda) is an open source package that manages tape back- 
ups. Developed at the University of Maryland, it’s included in many distributions of 
Linux, including Debian. Amanda’s features include: 


The use of traditional Unix backup formats such tar and dump 
Operation over a LAN, backing up client data to a central tape server 
Support for backing up Windows clients via file shares 


Support for standard tape devices and many tape changers, jukeboxes, and 
stackers 


Ability to balance full backups over a multi-day backup cycle 
Support for incremental backups to write daily changes 


Data compression on either the client or the server, or via devices that include 
hardware compression 


Prevention of accidental overwriting of the wrong media 
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e A holding disk strategy that allows for staged or delayed writing to media 
e Authentication through Kerberos or its own authentication scheme 


e Data encryption for protection over unsafe networks 


Installing Amanda 


Amanda has client and server components. The client is used on systems that have 
data that needs to be backed up. The server is used on systems that perform the 
backup work and write data to tape. 
Run the following command to install Amanda on the backup server: 

# apt-get install amanda-server 
Run the following command to install Amanda on each client Linux machine: 

# apt-get install amanda-client 


When you install these packages, the other packages that are needed will be 
included. If you wish to use the amplot program in Amanda, you will need to also 
install the gnuplot package. 


Amanda uses files in many different directories. These settings are configurable, but 
the defaults are: 


/etc/amanda 

Configuration files (server) 
/root 

The file /root/.amandahosts 
/usr/man/man8& 

Manpages 
/usr/share/doc/amanda-common 

Documentation files 
/usr/share/doc/amanda-client 

Client-specific documentation files 
/usrNib 

Shared libraries used by Amanda programs 
/usrNiblamanda 

Daemon programs and internal utilities 
/usr/sbin 

Command programs 
/varlib/amanda 

Running state, log, and other files 
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Configuring Amanda 


The /etc/services file should already have entries with the following names and port 
numbers. If these entries are not present, edit the /etc/services file and add them at 
the end. The comments are optional: 


/etc/services: 

amanda 10080/udp # amanda backup services 
amandaidx 10082/tcp # amanda backup services 
amidxtape 10083/tcp # amanda backup services 


You may also need to edit the /etc/inetd.conf file, which should contain the following 
entries: 


/etc/inetd.conf: (for clients) 
amanda dgram udp wait backup /usr/sbin/tcpd /usr/lib/amanda/amandad 


/etc/inetd.conf: (for server) 

amandaidx stream tcp nowait backup /usr/sbin/tcpd /usr/1ib/amanda/amindexd 

amidxtape stream tcp nowait backup /usr/sbin/tcpd /usr/lib/amanda/amidxtaped 
The first entry, named amanda, is needed on all clients. The other two entries are 
needed only on the server. If these lines are not present, edit the /etc/inetd.conf file 
and add them at the end. 


Amanda uses random ports after the initial communication. You should use Amanda 
over the Internet only through a VPN. This prevents the need to open a wide range 
of ports from the Internet into your LAN. 


Amanda runs as the user backup with disk group permissions. You will need to set 
access permissions for all files that you want to back up so that they can be read by 
Amanda. 


The Amanda server needs to be well connected to the local network, with sufficient 
bandwidth for the volume of data to be transferred. It should have a very large hold- 
ing disk, with enough space to hold twice the largest per-run dump size. A fast CPU 
is also needed if the server will be performing software compression. 


Amanda supports multiple configurations. Each configuration consists of a set of 
three files in a subdirectory of /etc/amanda: 


amanda.conf 
The main configuration file. You edit this to specify the disklist (see next item), 
tape device, backup frequency, your email address, reporting formats, and a 
huge array of other options. 

disklist 
This file specifies the hosts and disks to be backed up. 

tapelist 
This file lists active tapes, including dates when each was written. Amanda man- 
ages this file, so you can look at it but shouldn’t edit it. 





Backing Up and Archiving to Tape with Amanda | 253 


We 
Reporting the full details of all of Amanda’s options would take sev- 
ny eral pages, so we'll leave their exploration up to you. Sample files with 
vs 413° useful comments are provided in the /etc/amanda/DailySet1 directory 
` when you install the Debian amanda-server package. For details on 
these configuration files, see the Amanda manpage or http://wiki. 
zmanda.com. 








Amanda produces a report for each backup run. These detailed reports are sent by 
email to the user specified in the mailto option in the amanda.conf configuration file. 
You should review the reports regularly, particularly checking for errors and review- 
ing runtimes. 


Restoring Files Backed Up by Amanda 


Amanda uses standard Unix backup formats (tar or dump), which you specify in the 
configuration file. This allows backup tapes to be used to restore system files even if 
the Amanda system is not present. This can be crucial when restoring files after a 
complete disk failure. 


Amanda also provides indexed recovery tools to allow restoring of selected files. Be 
sure to configure index yes to have Amanda create the needed index files. The 
amrecover manpage provides full details. 


Backing Up MySQL Data 


Until now, we’ve been backing up files and directories. Databases have some special 
quirks that we need to address. Our examples use MySQL, but the same principles 
apply to PostgreSQL and other relational databases. 


If your MySQL server does not need to be available 24x7, a fast and easy offline raw 
backup method is: 
1. Stop the MySQL server: 
# /etc/init.d/mysqld stop 
2. Copy MySQL’s data files and directories. For example, if your MySQL data 
directory is /var/lib/mysql and you want to save it to /tmp/mysql-backup: 
# cp -r /var/lib/mysql /tmp/mysql-backup 
Instead of cp, you can use rsync, tar, gzip, or other commands mentioned earlier 
in this chapter. 
3. Start the server again: 
# /etc/init.d/mysqld start 


Online backups are trickier. If you have mutually independent MyISAM tables (no 
foreign keys or transactions), you could lock each one in turn, copy its files, and 
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unlock it. But you may have InnoDB tables, or someone could write a transaction 
involving multiple tables. Fortunately there are several reasonable noncommercial 
solutions, including mysqlhotcopy, mysqlsnapshot, replication, and mysqldump. 


mysqlhotcopy is a Perl script that does online raw backups of ISAM or MyISAM 
tables. The manpage includes many options, but here’s how to back up a single data- 
base named drupal: 


# mysqlhotcopy -u user -p password drupal /tmp 

Locked 57 tables in 0 seconds. 

Flushed tables (“drupal*.° access’, “drupal~.*accesslog’, “drupal”. aggregator_ 
category’, “drupal. aggregator_category feed’, ~drupal>. aggregator category item’, 
“drupal. aggregator_feed’, ~drupal~. aggregator_item, `drupal`.`authmap`, “drupal. 
`blocks`ò, ~drupal~.° book’, `drupal`.`boxes`, `drupal`.`cache`, `drupal`.`client`, 
`drupal`.`client_system`, `drupal`.`comments`, `drupal`.`contact`, `drupal`.`file_ 
revisions, `drupal`.`files`, `drupal`.`filter_formats`, `drupal`.`filters`, 
`drupal`.`flood`, `drupal`.`forum`, `drupal`.`history`ò, `drupal`.`locales_meta`, 
`drupal`.`locales_source`, `drupal`.`locales_target`, `drupal`.`menu`, `drupal`. 
`node`, `drupal`.`node_accessò, `drupal`.`node_comment_statistics`, `drupal`.`node_ 
counter’, ~drupal”.°node_revisions’, `drupal`.`permission`, `drupal`.`poll`, 
`drupal`.`poll_choices`, `drupal`.`poll votes’, `drupal`.`profile_fields`, `drupal`. 
“profile values’, `drupal`.`role`, `drupal`.`search_dataset`, `drupal`.`search_ 
index’, `drupal`.`search_total`, `drupal`.`sequences`, `drupal`.`sessions`, ~drupal>. 
“system, `drupal`.`term_data`, `drupal`.`term_hierarchy`, `drupal`.`term_node`, 
`drupal`.`term_relation`, `drupal`.`term_synonym`, `drupal`.`url_alias`, `drupal`. 
`users`, `drupal`.`users_roles`, `drupal`.`variable`, `drupal`.`vocabulary`, 

“drupal .`vocabulary_node_types`>, `drupal`.`watchdog`) in 0 seconds. 

Copying 171 files... 

Copying indices for 0 files... 

Unlocked tables. 

mysqlhotcopy copied 57 tables (171 files) in 1 second (1 seconds overall). 


mysqlsnapshot is even easier. It backs up all the ISAM or MyISAM tables on your 
server to one tar file per database: 

# ./mysqlsnapshot -u user -p password -s /tmp --split -n 

checking for binary logging... ok 

backing up db drupal... done 

backing up db mysql... done 


backing up db test... done 
snapshot completed in /tmp 


You'll find mysqlsnapshot at http://jeremy.zawodny.com/mysql/mysqlsnapshot. 


If you’ve set up MySQL replication for 24x7 availability, you can back up from a 
slave server using one of the methods just decribed. You'll also need to save replica- 
tion info (logs, configuration files, and so on). See Chapters 7 and 9 of High Perfor- 
mance MySQL by Jeremy D. Zawodny and Derek J. Balling (O’Reilly) for the gritty 
details. 


For extra protection from hardware corruption (but not human error), set up replica- 
tion and provide your slave (and/or master) with RAID 1 (mirrored) disks. 
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Many MySQL sites migrate data from MyISAM to InnoDB tables to get true data- 
base transactions and better write performance. The authors of the InnoDB module 
have a commercial product for online InnoDB backups named InnoDB Hot Backup, 
which you can order from http://www.innodb.com/order.php. 


The last method is usually the first mentioned in most documentation: mysqldump. 
Rather than a raw (verbatim) copy, mysqldump produces an ASCII dump of the spec- 
ified databases and tables. It works with all MySQL table types, including InnoDB. 
It’s relatively slow and the text files it produces are large, although they compress 
fairly well. It’s useful to create these dumps from time to time, because they contain 
a straightforward script for re-creating your databases and tables from scratch. You 
can use editors, grep, and other text tools to search through or modify the dump 
files. 


To lock all of your tables and dump them to a single file, enter: 
# mysqldump -u user -ppassword -x --all-databases > /tmp/mysq1.dump 
You can pipe the output through gzip to save some time and space: 
# mysqldump -u user -ppassword -x --all-databases | gzip > /tmp/mysq1.dump.gz 


A new open source tool (free download, pay for support) called Zmanda Recovery 
Manager for MySQL provides a useful frontend to many of these alternatives. The 
Zmanda web site (http:/;www.zmanda.com/backup-mysql.html) has the details, but 
we'll mention some of the notable features here: 

e Has a command-line interface. 

e Backs up local databases, or remote databases over SSL. 


e Emails the status of the backup procedure. 


Handles all table types, including InnoDB. 


e Does not provide any new backup methods. Instead, it chooses among mysql- 
dump, mysqlhotcopy, MySQL replication, or LVM snapshots. 


e Supports recovery to a particular transaction or point in time. 
pp ry p p 


Zmanda provides .tar.gz and .rpm files for many Linux distributions. For an installa- 
tion how-to for Debian, see http://www.howtoforge.com/mysql_zrm_debian_sarge. 
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APPENDIX 
bash Script Samples 














This appendix contains several scripts that can be useful to you in your daily work, 
as well as serving as models for writing other scripts. You can download the scripts 
from http://www.centralsoft.org. 


Adding Users 


If you have a lot of turnover (such as in a university, where new students enter in 
bunches once or several times a year), this script can help you add them to your sys- 
tem quickly. It reads a file listing information about each user and invokes useradd 
with the proper arguments (see the section “User Management” in Chapter 8 for 
details about useradd and its variants): 


#!/bin/bash 
expiredate=2009-02-18 


if [[ -z "$2" ]] ; then 
echo "" 
echo "Please give exactly one file name." 
echo "The file will have one user per line." 
echo "Each line will have:" 


echo username" 
n n 
echo group 
echo " personal real name" 
echo "" 


echo "Sample line: 
echo "alfredo marketing Alfredo de Darc" 
exit 1 

fi 


cat "$1" | while read username groupname realname 
do 
# Skip blank lines. 
if [[ -z $username || -z $groupname || -z $realname ]]; then 
continue 
fi 
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# Check whether the user already exists. 
# If so, report this and skip this user. 
result=$( egrep "*$username:" < /etc/passwd ) 
if [[ -n "$result" ]] ; then 
echo "User ‘$username’ already exists" 
continue 
fi 


# Check whether the group already exists. 
# If not, add the group. 
result=$( egrep "*$groupname:' 
if [[ -z "$result" ]] ; then 
groupadd "$groupname" 


< /etc/group ) 


fi 


# Add the user. 

useradd -c "$realname" \ 
-d "/home/$username" 
-e "$expiredate" 
-f 365 
-g "$groupname" 
-m 
-s /bin/bash 
"$username" 


Ñ 
\ 
\ 
\ 
\ 
x 


if [[ $? == 0 ]]; then 
echo "Successfully added user '$username'." 
else 
echo "Error adding user ‘$username’ (group \ 
‘$groupname', real name '$realname')" 
exit 1 
fi 


done 


Random Password Generator 


Here’s a script that generates a password of any requested length, in ASCII characters: 


#!/bin/bash 

n="$1" 

[[ -n "$n" ]] || n=12 

if [[ $n -1t 8 ]]; then 
echo "A password of length $n would be too weak" 
exit 1 

fi 

p=$( dd if=/dev/urandom bs=512 count=1 2>/dev/null \ 


| tr -cd 'a-zA-Z0-9' \ 
| cut -c 1-$n ) 


echo "${p}" 
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If this makes perfect sense to you as it stands, you deserve a reward.” While you’re 
out, the rest of us will look a bit more closely at the inherent flaws in this code. 


This code is typical of something you might inherent from a previous developer: no 
comments, unhelpful variable names, and some magic incantations. Since you want 
to make the world a better place, there are a few things you can do when writing 
scripts like this one. 


At the very least, you can leave comments describing the code’s purpose. These com- 
ments should be split into two parts: an overview right in the header (for example, 
indicating what the arguments passed to the script should specify, and any defaults), 
and explicit explanations in close proximity to difficult-to-understand processes. 
Don’t waste time just running through the basic commands used, because the main- 
tainer can look those up if he’s unfamiliar with them. However, where you employ a 
more exotic variant of a command, you should explicitly describe its effect and how 
you achieve it. 


Overall, you should aim to document the results you’re seeking with command sets, 
and why you're pursuing those results in the manner you’ve chosen. 


Now, here’s the explanation of the code for the password generator, in detail you’re 
unlikely to see in the real world. The script begins with the usual starting comment 
that tells the system to run the bash interpreter. Next, we assign the first argument 
string to the variable n, which will be the number of characters to generate. We put 
this in quotes because it may be a null string when the script is run with no argu- 
ments. That string is then tested to determine whether it actually is null. The -n argu- 
ment means “non-zero length,” so the test is actually true if a string is given. 


The two vertical bars will execute the assignment that follows if the test fails. This 
forces a default length of 12 for our password. The next four lines check to see 
whether the given length is too small; we have decided (based on classic recommen- 
dations by security experts) that the minimum length should be 8. 


The first statement in the loop body uses three system commands in a pipeline to 
generate one trial password. All three lines in the pipeline are placed inside $() to 
capture the output as a string that is then assigned to the variable p. 


To generate a random password, we need a source of random data; the system pro- 
vides that by combining a variety of sources of statistics into the /dev/urandom 
pseudodevice. The dd command reads some binary data from the device. The tr com- 
mand with the -cd option deletes all characters that are not in the ranges a-z, A-Z, and 
0-9. The last command in the pipeline, cut, extracts the desired number of characters. 


* Go to Starbucks. Order a Venti Mocha Frappuccino. Tell them it’s on the house. Run. 
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VA 
Don’t try to execute this command at your terminal and view the 
a results on the screen. You'll go blind for 10 minutes and your dog will 
ws (è start meowing. Did you give in to the temptation to do so? You may 
` have to execute an stty sane command to restore the screen to a useful 
state. 








Authoritative DNS Lookup 


This script uses the dig command introduced in Chapter 3 to do DNS lookups, 
bypassing the cache of the local DNS caching server. One feature of this script is that 
it uses its own name to specify what DNS record type to look up. If the script is 
named a, it looks up DNS A records. If it is named soa, it looks up DNS SOA records. 
The name ptr is a special case that takes an IPv4 address and converts it to the proper 
in-addr.arpa form to do the actual lookup. You should make a copy of this script 
with the appropriate name for each of the common DNS record types you may need 
to look up: a, aaaa, mx, and so on. You can also use hard links or symlinks to create 
the aliases. 


Regardless of the name, the script takes a list of hostnames to look up as arguments: 
!/bin/bash 


Copyright &#169; 2006 - Philip Howard - All rights reserved 


script a, aaaa, cname, mx, ns, ptr, soa, txt 


purpose Perform direct DNS lookups for authoritative DNS 


data. This lookup bypasses the local DNS cache 
server. 

syntax a [ names ... 
aaaa [ names ... 
any names ... 
cname [ names ... 
mx names ... 
ns [ names ... 
ptr [ names ... 
soa [ names ... 
txt [ names ... 











author Philip Howard 








For use with ptr query. 
function inaddr { 

awk -F. '{print $4 "." $3 "." $2 "." $4 ".in-addr.arpa.";}' 
} 
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query_type=$( exec basename "${o}" ) 


# Get and query for each host. 
for hostname in "$@" ; do 
if [[ "${query_type}" == ptr ]] ; then 
# A typical scripting trick: when a case can begin 
# with a numeral, place a dummy character such as x in 
# front because the case syntax expects an alphanumeric 
# character. 
case "x${hostname}y" in 
( x[0-9]*\.[0-9]*\.[0-9]*\.[0-9]*y ) 
hostname=$( echo "${hostname}" | inaddr ) 


a”) 
G= 
3 
esac 
fi 


# Execute the query. 
dig +trace +noall +answer "${query_type}" "${hostname}" | \ 
egrep "*${hostname}" 
done 
exit 


Sending Files Between Shell Sessions 


You can use the script presented in this section to send a file, or a directory of files 
(including all subdirectories), from one system to another using a shell session on 
each system. The script works by creating an rsync daemon (rsync is discussed in 
Chapter 11) in the foreground to send the specified file or directory. It displays a few 
different forms of rsync commands that could be used to receive that file or directory. 
This script does not need to exist on the receiving system, so it can even be used to 
send a copy of itself. The rsync package, however, must be installed on both systems. 


The sending system must have network access open for the port num- 
ber it uses to accept incoming rsync connections. The port number is 
chosen at random from the range 12288 through 28671. You can over- 
ride the random port selection by using the -p option followed by a 
port number. If your firewall rules only allow one or a few ports to be 
connected, you must use those port numbers with this script. 








To transfer data, first run this script on the sending system. Once it outputs the sam- 
ple commands, select which command would be appropriate to use based on the IP 
address or hostname that can reach the sending system, and the target location 
where the file or directory is to be stored on the receiving system. Copy the selected 
command line, and paste that command into the shell of the receiving system to exe- 
cute the rsync command that receives the data. The daemon will continue to run 
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when the transfer is complete, allowing you to transfer a file or directory multiple 
times to different computers. Stop the daemon when the transfers are complete by 
pressing Ctrl-C in the sending system’s shell window. 








This script has no security. Anyone who can reach the address and 
port number on which it’s listening can retrieve the data being trans- 
ferred. You should not use this script to transfer confidential or secret 
data; try scp or sftp instead. Be sure to terminate the daemon once the 
desired transfers are completed. 


The suggested name for this script is rsend: 








!/bin/bash 

Copyright &#169; 2006 - Philip Howard - All rights reserved 
script rsend 

purpose To start an rsync daemon in the shell foreground 


to send a specified directory or file when 
retrieved using one of the rsync command lines 
shown, by pasting it in a shell session on another 


host. 
# usage rsend [options] directory | file 
options -c include checksum in the rsync command lines 
-d change daemon to the specified directory 
-n include dryrun in the rsync command lines 
# -p use the specified port number, else random 
-s include sparse in the rsync command lines 
-u user to run as, if started as root 
-v show extra information 
author Philip Howard 
umask 022 
hostname=$( exec hostname -f ) 


whoami=$( exec whoami ) 
uid="${whoami}" 


checksum= 
delete="" 
delmsg="" 
dryrun= 
padding="------- 
port="" 
sparse="" 
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verbose= 


bar1="-------------------------- 
bar1="#${bar1}${bar1}${bar1}" 


bar2="HH HH HA HA A 
bar2="#${bar2}${bar2}${bar2}" 


while [[ $# -gt 0 && "x${1:0:1}" = "x-" ]]; do 
case "x${1}" in 
( x-c | x--checksum ) 


checksum="c 


( x--delete ) 
delete="_--delete" 
delmsg="/delete" 





padding="" 

( x-d | x--directory ) 
shift 

cd "${1}" || exit 1 

( x--directory=* ) 

cd "${1:12}" || exit 1 
( x-n | x--dry-run ) 
dryrun="n" 

( x-p | x--port ) 
shift 

port="${1}" 

( x--port=* ) 
port="${1:7}" 

( x-s | x--sparse ) 
sparse="S" 

( x-u | x--user ) 
shift 

uid="${1}" 


( x--user=* ) 
uid="${1:7}" 


33 
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( x-v | x--verbose ) 


verbose=1 
esac 
shift 
done 
Gp ed octet tet et et Re ha a et Sale che oO Ed oa aie 
# Get a random number for a port. 
Gf a oreo eters ao emis SEG S eS ae os Sate E E AA TAE 
if [[ -z "${port}" || "${port}" = 0 || "${port}" = . ]]; then 
port=$( dd if=/dev/urandom ibs=2 obs=2 count=1 2>/dev/null \ 
| od -An -tu2 | tr -d'' ) 
port=$[ $port % 16384 ] 
port=$[ $port + 12288 ] 
fi 
Homi A S ote cece Sth tee cee Aaa tee oe eth hte A 
# Make up names for temporary files to be used. 
df atari or Ste els Sonia Snes ie Se een Bee ene sem one eee eee Ene 


conffile="/tmp/rsync-${whoami}-${port}-$$.conf" 
lockfile="/tmp/rsync-${whoami}-${port}-$$. lock" 


# This function adds quotes to strings that need them. 
# Add single quotes if it has one of these: space $ " ` 
# Add double quotes if it has one of these: ' 

# Note: not all combinations will work. 


function strquote { 
local str 


str=$( echo "${1}" | tr -d ' $""' ) 
if [[ "${str}" != "${1}" ]]; then 
echo "'${1}'" 

return 

fi 

str=$( echo "${1}" | tr -d"'" ) 
if [[ "${str}" != "${1}" ]]; then 
echo '"'"${a}"'"' 

return 

fi 

echo "${1}" 

return 0 


if [[ $# -gt 1 ]]; then 
echo "Only one name (directory or file)" 1>&2 
exit 1 
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elif [[ $# -eq 1 ]]; then 
name="${1}" 

else 
name=$( exec pwd ) 

fi 


# Set up a temporary config file. 

# 

# Arguments: 

# $1 Directory transferred, or where transfer is starting 
# $2 Not used (AO: Should be removed) 

# $3 File transferred (if single file specified) 


function configout { 
echo "lock file = ${lockfile}" 
echo "log file = /dev/stderr" 
echo "use chroot = false" 
echo "max connections = 32" 
echo "socket options = SO KEEPALIVE" 
echo "list = yes" 
echo "[.]" 
echo "path = ${1}" 
echo "read only = yes" 
echo "uid = ${uid}" 
echo "comment = ${2}" 
if [[ -n "${3}" J]; then 
echo "include = **/${3}" 
echo "exclude = **" 


fi 
} 
R EAA ue ces toe cop ese ecosdd pos bausededdessbauesdesdoosieceeee 
# Get directory and file. 
Hew Bec EIE ENEE E E EEA ee eee eee EEEN 


if [[ ! -e "${name}" ]]; then 
echo "does not exist:" $( strquote "${name}" ) 1>&2 
exit 1 
elif [[ -d "${name}" ]]; then 
p=$( exec dirname "${name}" ) 
b=$( exec basename "${name}" ) 
d="${name}" 
far" 
r=$( cd "${name}" && exec pwd ) 
announce="${d}" 
rsyncopt="-a${checksum}${dryrun}H${sparse}vz${delete}" 
configout "${d}/." "directory:${d}/" >"${conffile}" 
elif [[ -f "${name}" ]]; then 
p=$( exec dirname "${name}" ) 
b=$( exec basename "${name}" ) 
d="${p}" 
f="${b}" 
r=$( cd "${p}" && exec pwd ) 
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r="${r}/${b}" 

announce="${d}/${f}" 

rsyncopt="-a${checksum}${dryrun}${sparse}vz" 

configout "${d}/." "file:${d}/${f}" >"${conffile}" 
elif [[ -L "${name}" ]]; then 

p=$( exec dirname "${name}" ) 

b=$( exec basename "${name}" ) 

d="${p}" 

f="${b}" 

r=$( cd "${p}" && exec pwd ) 

r="${r}/${b}" 

announce="${d}/${f}" 

rsyncopt="-a${checksum}v" 

configout "${d}/." "symlink:${d}/${f}" "${f}" >"${conffile}" 


if [[ -n "${verbose}" ]]; then 
echo "${bar2}" 
ls -ld "${conffile}" 
echo "${bar2}" 
cat "${conffile}" 


function showrsync { 
echo -n "rsync ${rsyncopt} " 
if [[ -n "${oldfmt}" ]]; then 
echo "--port=${port}" $( strquote "${1}::${2}" ) $( strquote "${3}" ) 


else 
echo $( strquote "rsync://${1}:${port}/${2}" ) $( strquote "${3}" ) 
fi 
return 
} 
How ee tie see ee oe eee a 
# These functions show rsync commands for hostname and IP address. 
Hi REEE i sees t ieee tome ct tts face ees Meee e eh le Sime 


function getip { 
case $( exec uname -s ) in 
( SunOS ) 
netstat -i -n | awk '{print $4;}' 
3 
( Linux ) 
ifconfig -a | awk '{if($1=="inet")print substr($2,6);}' 
33 
(*) 
netstat -i -n | awk '{print $4;}' 
a3 
esac 
return 
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} 


function ipaddr { 


getip \ 

| egrep '*[0-9]*\.[0-9]*\.[0-9]*\. [0-9] *$' \ 
| egrep -v '^o\. |^127\.' \ 

| head -2 \ 


| while read ipv4 more ; do 
showrsync "${ipv4}" "$0" 
done 

return 


} 


function showcmd { 
ipaddr "${2}" "${3}" 
showrsync "${1}" "${2}" "${3}" 


return 
} 
Pects Soe oed shies Doce es Set savas ounce ved oeei daa uaea on aE 
# Announce the shell commands to receive this data. 
Gi soe Sees ee ook ot et oe cease eos Soe EE E heh ee 


echo "${bar2}" 
echo "# sending ${announce}" 
echo "# paste ONE of these commands in a remote shell to receive" 


if [[ -d "${name}" ]]; then 
echo "${bar1}" 
showcmd "${hostname}" . . 


echo "${bar1}" 
showcmd "${hostname}" . "${b}" 


if [[ "${d}" != "${b}" 8& "${d}" != "${r}" ]]; then 
echo "${bar1}" 

showcmd "${hostname}" . "${d}" 

fi 


echo "${bar1}" 
showcmd "${hostname}" . "${r}" 
else 
echo "${bar1}" 
showcmd "${hostname}" "./${f}" "${b}" 


s=$( exec basename "${d}" ) 
s="${s}/${F}" 

if [[ "${s}" != "${b}" ]]; then 

echo "${bar1}" 

showcmd "${hostname}" "./${f}" "${s}" 
fi 


if [[ "${name}" != "${b}" \ 
&& "${name}" != "${s}" \ 
&& "${name}" != "${r}" ]]; then 
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echo "${bar1}" 
showcmd "${hostname}" "./${f}" "${name}" 
fi 


echo "${bar1}" 
showcmd "${hostname}" "./${f}" "${r}" 
fi 


echo "${bar1}" 
echo "# press ^C here when done" 
echo "${bar2}" 


s="DONE" 

trap ‘s="SIGINT ... DONE"' INT 

trap 's="SIGTERM ... DONE"' TERM 

rsync --daemon --no-detach "--config=${conffile}" "--port=${port}" 
rm -f "${conffile}" "${lockfile}" 

echo "${s}" 


Integrating ssh and screen 


You should already be familiar with the ssh command, which connects to another 
computer and starts a shell there in a secure manner. The screen command is a use- 
ful tool that allows such a shell session to be held in an active state, with its screen 
contents intact, when you disconnect from the remote computer. The held shell ses- 
sion can then be reconnected later, even from a different computer. It is also possi- 
ble to have two or more connections to the same shell session. 


The following script makes an ssh connection and starts a named screen session in 
one command. The benefit of using this script is quicker connecting and disconnect- 
ing when working with multiple servers. 


This script is used much like the ssh command. The ssh syntax that specifies the 
username and hostname of the remote session is expanded to also include a session 
name. You can create multiple sessions on the remote host under the same user- 
name with different session names. The session name is optional. If it is not given, 
this script runs the ssh command in the normal way, without running screen. The full 
syntax of this script, including the ssh options it supports, can be seen in the script’s 
comments. 


The suggested name for this script is ss: 


#!/usr/bin/env bash 


# Copyright &#169; 2006 - Philip Howard - All rights reserved 
# 
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# 








command 


purpose 


syntax 


options 


ss (secure screen) 


Establish a screen based background shell session 
via secure shell communications. 


ss [options] session/username@hostname 
ss [options] session@username@hostname 
ss [options] username@hostname/session 
ss [options] username@hostname session 
-h hostname 
-h=hostname 
-i identity 
-i=identity 


-1l loginuser 

-l=loginuser 

-m Multi-display mode 

-p portnum 

-p=portnum 

-s session 

-s=session 

-t Use tty allocation (default) 
-T Do NOT use tty allocation 

-4 Use IPv4 (default) 

-6 Use IPv6 

-46 | -64 Use either IPv6 or IPv4 


requirements The local system must have the OpenSSH package 


note 


author 


installed. The remote system must have the 
OpenSSH package installed and have the sshd 
daemon running. It must also have the screen(1) 
program installed. Configuring a .screenrc 
file on each system is recommended. 


The environment variable SESSION NAME will be set 
in the session created under the screen command 


for potential use by other scripts. 


Philip Howard 


whoami=$( exec whoami ) 
hostname=$( exec hostname ) 


h= 
is 


m 


p= 


() 


() 


galt 


t= 


(=t) 


u="${whoami}" 


V= 


( -4 ) 
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while [[ $# -gt o ]]; do 
case "x${1}" in 
( x*/*@* ) 
# Example: session1/lisa@centrhub 
u=$( echo "x${1}" | cut -d @ -f 1 ) 
u="${u:1}" 
s=$( echo "x${u}" | cut -d 
u=$( echo "x${u}" | cut -d 
u="${u:1}" 
h=$( echo "x${1}" | cut -d @ -f 2 ) 
shift 
break 
( x*@*/* ) 
# Example: lisa@centrhub/session1 
u=$( echo "x${1}" | cut -d @ -f 1 ) 
u="${u:1}" 
h=$( echo "x${1}" | cut -d @ -f 2 ) 
s=$( echo "x${h}" | cut -d / -f 2 ) 
h=$( echo "x${h}" | cut -d / -f 1 ) 
h="${h:1}" 
shift 
break 
( x*@*@* ) 
# Example: session1@lisa@centrhub 
s=$( echo "x${1}" | cut -d @ -f 1 ) 
s="${s:1}" 
u=$( echo "x${1}" | cut -d @ -f 2 ) 
h=$( echo "x${1}" | cut -d @ -f 3 ) 
shift 
break 


( x*@* ) 
# Example: lisa@centrhub 
u=$( echo "x${1}" | cut -d @ -f 1 ) 
u="${u:1}" 
h=$( echo "x${1}" | cut -d @ -f 2 ) 
# Next argument should be session name. 
shift 
if [[ $# -gt o J]; then 
s="${1}" 
shift 
fi 
break 


(x-he* ) 
h="${1:3}" 


(xh ) 
shift 
h="${1}" 


( x-i=* ) 
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i="${1:3}" 
if [[ -z "${i}" ]]; then 


i-( ) 
else 

i=( -i "${1:3}" ) 
fi 
(x-i ) 
shift 
i=( -i "${1}" ) 
( x-1=* | x-u=* ) 
u="${1:3}" 
(x-1 | x-u ) 
shift 
u="${1)" 
Cei | x--multi ) 
m=1 
(x-p=* ) 
p="${1:3}" 


if [[ -z "${p}" ]]; then 
p=( ) 


p=( -p "${1:3}" ) 


else 


fi 





(xp ) 
shift 

p=( -p "${1}" ) 
( x-s=* ) 


s="${1:3}" 


(xs ) 
shift 

s="$(1}" 
ee: 

t=( -t ) 
(xT ) 
t-( ) 

(x-4 ) 
v=( -4 ) 
(x-6 ) 
v=( -6 ) 
C x-46 | x-64 ) 
v=() 


33 
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( x-* ) 


echo "Invalid option: '${1}'" 


die=1 
3 
Ce) 
echo "Invalid argument: '${1}'" 
die=1 
3 
esac 
shift 
done 
Hustel tui ects sents ie tess cece tee ween le tet eree tee Aces 
# Make sure essential information is present. 
fice cde oon Sou con vod hes bee fo evedloa Set eee hedacs Sek. EEEE 


if [[ -z "${u}" ]]; then 
echo "User name is missing" 
die=1 

fi 


if [[ -z "${h}" ]]; then 
echo "Host name is missing" 
die=1 

fi 


[[ -z "${die}" ]] || exit 1 


c=( ssh "${v[@]}" "${il@]}" “S{pl@]}" "${t[@]}" "${u}@${h}" ) 
if [[ -n "${s}" ]]; then 
=" -DR" 
[[ -n "${m}" ]] && o="-x" 
x="exec /usr/bin/env SESSION NAME='${s}' screen ${o} '${s}'" 
: c=( "${c[@]}" "${x}" ) 
i 


exec "${c[@]}" 





272 


| Appendix: bash Script Samples 





Symbols 


\ (backslash), 212 

[[ ]] (double brackets), 218 
$ (dollar sign), 217 

$? (dollar question), 218 
$$ (double dollar sign), 218 
` (grave), 217 

% (percent), 141 

# (pound sign), 213, 169 
"(double quotes), 217 

' (single quote), 217 

_ (underscore), 217 


A 


ab (benchmarking program, Apache), 144 
access log files, 140 
adduser command, 184, 186 
Alias directives, 134 
Amanda, 236, 251-254 
configuring, 253 
installing, 252 
restores from, 254 
Apache, 16, 33-34, 122-152 
alternatives to, 162 
benchmarking, 144 
configuration files, 127—140 
authentication and authorization, 130 
containers and aliases, 133 
directives, 128-130 
pattern matching, 133 
PHP module-specific directives, 138 
resource directives, 134 


Index 


server-side includes, 134-138 
virtual hosts, 138—140 
DNS and, 124, 140, 149 
installation, 124 
logging, 140-142 
cron jobs, 140 
log splitting and rotation, 140 
vlogger, 141 
Webalizer, 142 
models and prefork model, 144 
mod_php installation, 125 
scripting language modules, 123 
SSL/TLS encryption, 142 
suEXEC support, 143 
APC, 162 
apt-get, 15 
quota package, installing, 17 
arguments, command line, 212 
authentication and authorization, 130 


B 


backslash (\), 212 
backups, 236 
automation of, 241 
listing files on the backup server, 240 
MySQL databases, 254-256 
optical media, 245-251 
restores, 241 
rsync, 237—240 
bash script, 239 
source and destination arguments, 239 
tape backup using Amanda, 251-254 
tar archives, 242-245 


We'd like to hear your suggestions for improving our indexes. Send email to index@oreilly.com. 





273 








bash, 211 
arithmetic, 219 
backup script, 239 
bash script samples, 257—272 
adding users, 257 
authoritative DNS lookup, 260 
file transfers between shell 
sessions, 261 
random password generation, 258 
ssh and screen commands, 
integrating, 268 
cron jobs, 225 
default path, 214 
expressions, 218 
if, elif, and then, 219 
T/O redirection, 215 
loops, 223 
pathnames, 213 
permissions, 213 
pipes, 215 
script troubleshooting, 221 
shell variables, 220 
variables, 217 
bastion hosts, 173 
batch jobs, 212 
benchmarking, 144 
Beowulf, 154 
BIND (Berkeley Internet Name 
Daemon), 40-71 
BIND 4, 40 
BIND tools, 62—65 
chroot environments and non-root 
usage, 42 
components, 40 
initial minimal setup, 18 
troubleshooting, 66-71 
versions, 40 
Bourne, Stephen, 211 
break command, 225 
Brehm, Till, 74 
bzip2, 242 


c 


CAs (certificate authorities), 143 
cdrecord, 246 
configuration, 248 
CD-Rs, 245 
accessing, 247 
preparation for recording, 248 
recording, 249 
certificates, 143 


CGI (Common Gateway Interface), 123 
CGI directories and interpreters, 137 
chkconfig command, 171 
chmod command, 214 
chroot environments, 18, 42 
CIFS (Common Internet File System), 164 
ClarkConnect, 176 
clock synchronization, 36 
clusters, 154 
HA (high-availability) configuration, 161 
Linux Virtual Server, 154 
load balancing (see load balancing) 
realservers, 157 
configuring, 157 
scaling without LB and HA, 162 
testing, 159-161 
code caches, 162 
command, 212 
comment character (#), 213 
Common Gateway Interface (CGI), 123 
Common Unix Printing System (see CUPS) 
Common Vulnerabilities and Exposures 
(CVE) list, 22 
Comprehensive Perl Archive Network 
(CPAN), 36 
conf.d directory, 127 
connection sharing, 177 
containers, 133 
continue command, 225 
CPAN (Comprehensive Perl Archive 
Network), 36 
cron jobs, 225 
crontab file, 225 
CUPS (Common Unix Printing System), 183 
CLI commands, 185 
CVE (Common Vulnerabilities and 
Exposures) list, 22 


D 


daemon-monitoring daemons (DMDs), 96 
data caches, 162 
data munging, using scripts, 227 
databases (see MySQL) 
Debian, 9 
default packages, changing, 15 
installation, 10 
mail transport agents, 105 
Postfix (see Postfix) 
startup scripts, modifying, 16 
demilitarized zone (DMZ), 174 
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DHCP (Dynamic Host Configuration 
Protocol), 168-172 
installing, 169 
IPv6 addressing with radvd, 172 
starting up, 171 
static IP addressing, 172 
dhcpd.conf file, 171, 175 
Firestarter version, 177 
dhcpd.leases file, 171 
dig command, 41, 260 
directives, 128 
Directory directives, 133 
disk usage, managing (see quotas) 
distributed filesystems, 164 
distribution, reasons for choosing, 9, 163, 
176 
dist.txt, 77 
djbdns, 40 
DMDs (daemon-monitoring daemons), 96 
DMZ (demilitarized zone), 174 
DNS (Domain Name System), 38 
administrative responsibilities, 45 
bash script for authoritative lookups, 260 
caching-only servers, 49 
configuration files, editing, 50-62 
finding domains, 46 
firewall issues, 48 
initial minimal setup, 18 
primary and secondary servers, 47—49 
queries, 46—47 
server setup, 14, 41—44 
configuration, 44 
troubleshooting, 66-71 
DocumentRoot directives, 130 
domain controllers, 165 
domain name space, 38 
drop-in replacements, 22 
Drupal, 145-149 
configuring, 148 
installing, 146-148 
apt-get, 146 
from source, 147 
DSOs (dynamic shared objects), 124 
DVD-Rs and DVD+Rs, 245 
dvd+rw-tools, 246 
dynamic files, 122 
Dynamic Host Configuration Protocol (see 
DHCP) 
dynamic shared objects (DSOs), 124 


E 


e-accelerator, 162 

echo command, 213 

egrep command, 220 

email client configuration, 120 
email (see mail services) 

error log files, 140 


Exim, 12 
Exim 4, 105 
F 


FastCGI, 123 
Fedora Core, 163, 199, 201 
Feigenbaum, Barry, 164 
file sharing, 164 
enabling between Windows XP and 
98, 167 
filenames, 222 
Files and FilesMatch directives, 133 
Firestarter, 176—180 
firewalls 
DMZs and, 174 
DNS and, 48 
gateway and firewall products, 176 
iptables, 174 
screened-subnet firewalls, 174 
(see also gateway services) 
for loop, 223 
FTP services, 34 


G 


gateway servers, 170 
gateway services, 173-180 
group files, 130, 132 


guest, 194 
gzip, 242 
H 


HA (high availability), 155 
headless mode, 12 

heartbeat, 155 

high availability (HA), 155 
high-performance computing, 196 
-htaccess files, 127, 162 
-htpasswd file, 130 
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ide-scsi driver, 247 

IMAP, 22-32, 119 

inetd, 16 

InnoDB Hot Backup, 256 

install_ispconfig directory, 77, 80 

T/O redirection, 215 

IP-based virtual hosts, 138 

IPCop, 176 

ipopd-ssl, 119 

iptables, 174 

IPv6 addressing, 172 

IPVS (IP Virtual Server), 155 
configuration, 155 

ISO image files, 246 

ISO-9660 filesystem, 246 

isomd5 bash script, 250 

ISPConfig, 73-96 
Apache server compilation, 78 
clients and web sites, adding, 83 
directory structure, 82 
email clients, configuring, 95 
email management, 91 
hierarchical model for web site files, 89 
installing, 74 
procedures on compilation failure, 80 
requirements, 74 
server and users, setting up, 83 
services configured using, 74 
special daemons, 76 
user management, 91 
web site setup, 83 


K 


K3b, 246 
KeepAlive directive, 134 
KeepAliveTimeout directive, 134 


L 


LAMP (Linux, Apache, MySQL, 
PHP/Perl/Python), 123 
LB (see load balancing) 
Idirectord, 155, 156 
libe client, 11 
lighttpd, 162 
Linux system administration 
job opportunities and 
responsibilities, 4-7 
required skills and knowledge, 1 
skill sets, 5 


Linux Virtual Server, 154 
Listen directive, Apache, 130 
load balancing, 154-162 
example configuration, 155 
high-availability, adding, 161 
IPVS, 155 
lb server configuration, 158 
Idirectord, 156 
software for, 155 
testing, 159-161 
local network services (see network services) 
Location directive, Apache, 133 
loops, 223 
LPD and LPRng, 182 
LVS-NAT, LVS-DR, and LVS-TUN, 157 


M 


mail command, 111 
mail delivery agents (MDAs), 103 
mail services, 22, 102-121 
email client configuration, 120 
IMAP, 119 
POP3, 119 
setup, 22-32 
Spam Assassin, 36 
testing, 110 
mail transport agents (see MTAs) 
mail user agents (MUAs), 103 
maildir format, 119 
maildir versus libc clients, 11 
masquerading, 174 
MaxClients directive, Apache, 134 
MaxRequestsPerChild directive, 
Apache, 134 
mbox storage format, 119 
MDAs (mail delivery agents), 103 
POP3 and IMAP, 119 
memcached, 162 
mkisofs command, 248 
mod_expires, 162 
mod_php, 125 
mods-enabled directory, 127 
mod_vhost_alias, 139 
monit, 97 
installing and configuring, 98-101 
MTAs (mail transport agents), 11, 12, 103 
MUAs (mail user agents), 103 
mutt, 111 
MySQL, 20, 125 
data backups, 254-256 
InnoDB Hot Backup, 256 
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mysqldump, 256 
mysqlhotcopy, 255 
mysqlsnapshot, 255 

root user password, setting, 126 


N 


name-based virtual hosts, 139 
named, 40, 47 
function, checking, 44 
nameservers, 38 
NAT (Network Address Translation), 174 
Netfilter, 176 
netsetup.exe, 167 
Network Address Translation (NAT), 174 
Network File System (NFS), 167 
network services, 163-168 
configuration, 165 
cross platform file sharing, 
configuring, 167 
distributed filesystems, 164 
internet gateways (see gateway services) 
packaged gateway and firewall 
products, 176-180 
print services (see print services) 
Samba, 164 
user management (see user management) 
NFS (Network File System), 167 
NTP (Network Time Protocol) services, 36 





0 


open relaying, 103 

Open SSL, 115-118 

operators, 218 

optical media, 245-251 
cdrecord package, 246 
ide-scsi driver, 247 
ISO image files, 246 
verifying recordings, 250 

output, 212 


P 


passwd command, 186-189 
adding a user, 186 
disabling a user, 189 
password file, 227 
PAT (Port Address Translation), 174 
pathnames, 213 
paths, 213 
default path, 214 
percent (%), 141 


Perl, 36 
Apache module, 123 
script example, 230 
SpamAssassin, installing modules needed 

by, 36 

permissions, 213 

PHP, 125 
Apache module, 123 
module-specific directives, 138 
script example, 232 

pipes, 215 

POP3, 22-32, 119 

Port Address Translation (PAT), 174 

postconf command, 27 

Postfix, 22-32, 105 
configuration, 108—110 
Debian packages for, 105 
installing, 106—108 

pound (#) sign, 169 

print services, 181—186 
cross-platform printing, 183 
CUPS (see CUPS) 
networking hardware types, 181 
print queue control via command 

line, 185 

printing software, 182 

ProFTPD, 34 

Projektfarm GmbH, 74 

prompt, 212 

Python, 233 


Q 


quotas, 17 


R 


radvd, 172 

realservers, 157 
configuring, 157 

refresh values, 48 

relational databases, 20 

remote login, 12 

replication, 162 

resolv.conf file, 40, 47, 178 

resolver, 40 

restores from backups, 241 

retry values, 49 

root directories, 38 

root servers, 45 

root user, 11 

round-robin DNS, 155 

rsend, 262 
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rsync, 236, 237-240 
backup server, listing files on, 240 
restores from backups, 241 
sending files between shell sessions, 261 
syntax and options, 237 

rule files, BIND, 47 


S 


Samba, 164, 184 
SASL (Simple Authentication and Security 
Layer), 23, 111-115 
scalability, 154 
screen command, 268 
screened-subnet firewalls, 174 
scripting, 211, 226 
bash example, 228 
bash (see bash) 
Perl example, 230 
PHP example, 232 
Python example, 233 
scripting languages, choosing, 234 
troubleshooting scripts, 221 
Secure Shell 
disabling access, 189 
Secure Sockets Layer (see SSL) 
security, 96-101 
chroot environments, 18, 42 
daemon-monitoring daemons, 96 
DNS and BIND, 42 
mail services, 23 
Sendmail vulnerabilities, 103 
spam, 103 
self-signed certificates, 143 
SELINUX, 199 
Sendmail, 103 
versus Exim, 12 
vulnerabilities, 22 
serial number, 48 
Server Message Block (see Samba) 
server setup, 8 
Apache, 33-34 
components, 9 
Debian installation (see Debian) 
DNS servers (see DNS) 
FTP services, 34 
headless mode, 12 
mail services, 22-32 
SpamAssassin, 36 
network configuration, 13 
NTP services, 36 
relational databases, 20 
remote login, 12 


requirements, 9 
system clock synchronization, 36 
user, root, and postmaster accounts, 11 
web hosting services (see ISPConfig) 
web statistics summarization, 35 
weight, 159 
server-side includes, 122, 134-138 
shares, 164 
shell scripts, 211 
shell variables, 220 
Shorewall, 176 
silos, 195 
Simple Authentication and Security Layer 
(SASL), 23 
simultaneous multi-threading (SMT), 196 
sites-enabled directory, 127 
SMB (see Samba) 
Smoothwall, 176 
SMT (simultaneous multi-threading), 196 
smtpd.conf file, 27 
SpamAssassin, 36 
spammers, 104 
Squid, 162 
ss script, 268 
SSH clients, 12 
ssh command, 268 
remote administration using, 12 
SSI (see server-side includes) 
SSL (Secure Sockets Layer), 23, 115-118, 
142 
certificate and key generation, 27 
https, 119 
standard input, standard output, and 
standard error, 215 
static files, 122 
static IP addressing, 10, 172 
static linking, 124 
sucommand, 11 
suEXEC, Apache, 78, 143 
sysconfig.txt, 175 
system administration requirements, 4-7 
system clock synchronization, 36 
system data, 237 
system-config-securitylevel program, 200 


T 


tar archives, 76, 236, 242-245 
backup to tape (see Amanda) 
-c and -x options, 245 
creating an archive, 243 
example packing and unpacking, 244 
extracting files from archives, 243 
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file extensions used in, 242 
tar command syntax and options, 242 
tarballs, 76 

Timme, Falko, 74 

TLDs (see top-level domains) 

TLS (Transport Layer Security), 23, 

115-118, 142 

top-level domains, 38, 45 

touch command, 171 

Transport Layer Security (see TLS) 


Cc 


BEs (unsolicited bulk emailers), 104 
buntu, 204 
DF (Universal Disk Format), 246 
ltra Monkey, 156 
ML (User-Mode Linux), 196 
nsolicited bulk emailers (UBEs), 104 
ntil loop, 223 
ser and Group directives, 129 
user data, 237 
user files, 130—132 
user management, 186-193 
adding users 
bash shell script, 257 
graphical user managers, 191 
user removal, 189 
home direcories, locking, 190 
Secure Shell access, disabling, 189 
useradd command, 186 
User-Mode Linux (UML), 196 
uw-imapd-ssl, 119 


Cree Ee 


cc 





= 


V 


variables, 217 

Venema, Wietse, 102 

virtual hosting, 16, 138—140 
mod_vhost_alias, 139 

virtual servers for load balancing, 157—159 

virtualization, 194—196 
advantages and benefits, 197—199 
future potential, 210 


high-performance computing, 196 
VMware (see VMware) 
Xen (see Xen) 

vlogger, 141 

VMware, 194, 204-209 
guest operating system installation, 209 
installing, 204 


W 


web hosting services (see ISPConfig) 
web servers (see Apache) 
web services, 122 
CGI, 123 
LAMP setups, 123 
MySQL database, 125 
scalable software, 162 
static and dynamic files, 122 
troubleshooting, 149-153 
web statistics summarization, 35 
Webalizer, 35, 142 
weight, 159 
while loop, 223 
Windows file sharing in Linux 
environments, 166 
wodim, 246 


X 


Xandros, 165 
Xen, 194, 199-204 
guest hosts, installing, 201 
installation, 199 
requirements, 199 


Y 
yum, 199 


Z 


Zmanda Recovery Manager for MySQL, 256 
zone files, 44 
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